Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scan to Loop roadmap #1072

Open
mathieupoumeyrolsonos opened this issue May 2, 2023 · 1 comment
Open

Scan to Loop roadmap #1072

mathieupoumeyrolsonos opened this issue May 2, 2023 · 1 comment

Comments

@mathieupoumeyrolsonos
Copy link
Collaborator

mathieupoumeyrolsonos commented May 2, 2023

Step 0: Preliminaries

  • cancel chunk support

Step 1: Replace Scan by Loop

  • support ONNX Loop
  • simplify (?) scan code by splitting to separate ops
    • state management
    • scan slice and concat logic
  • caveat: batch i/o extraction may become harder (recognising a scan input will need some pattern matching)

State management

  • repurpose new operators introduced for pulsed conv state management
  • use skip to lock state at its initial value up to the right time

Processing input

  • X is the full pulse input
  • DynamicSlice to extract nth hyperplan in tensor
  • a way to manage "n"
    • if the input is a scan, n_max is known
  • batch input extraction optimisation:
    • detect input/loop boundary/DynamicSlice
    • => always perform extraction without looking at the actual n value bahaviour, assuming we are going to use all (or most) of them

Processing output

  • if n is known (scan), easy
  • if n unknown
    • could work with plain concat (but will copy output at each loop iteration)
    • MutSlice could also reallocate the tensor if n > len
  • other option: hidden tensorseq
    • store inside the loop the stack of tensors to concat
    • after the loop, access this list to build the "full scan" output
    • if we don't want tensor sequence, the transmission between these two ops would have to go through the sate...

Unrolling loops ?

  • when N is known and small (small pulse...)
  • useful for LSTM & co on loop-less runtimes

Current choice:

  • N is implemented manually inside the body, not by the loop
    • three major cases:
      1. scan: n takes all consecutive values from 0 to n_max
      2. open loop: n counts up to a dynamic EOL condition
      3. general case: n is a scalar doing any random access (use case unknown)
    • we want 1. to be competitive with scan, 2. to be reasonably fast and 3. only needs to work
  • batch input extraction is done without regarding the scalar value on the dynamic slice
  • batch output extraction: same logic. we will not check what N is doing, we will just match a "axis-wise" operator followed by a MutSlice (or whatever the name is)

Step 2: flatten subgraphs

  • main goal: simplify axis analysis and loop input/output batch extraction
  • how to implement loop control flow ?
    • aka: do we want runnable model to use subgraphs ? or conditional jumps in eval order ?
  • nnef: do we want a loop {} construct in nnef instead of the subgraph ?
@mathieupoumeyrolsonos
Copy link
Collaborator Author

Possible next step:

  • split away scanning logic and index management from scan, materializing them in body. Not optimising.

long term, nnef extension in this spirit of :

graph body_rec(xs) -> ys {

    i = 0
    ys = zeroes[shape(x)]
    c = zeroes[...]

    loop {
        break if i == shape(x)
        x = xs[i];
        ys = assign_slice(ys, i, F(x, c));
        c = C(x, c)
        i = i + 1;
    }

    ys
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant