We will support different attention approaches. Candle provides us a brought varierty on existing implementations.
Type | From where |
---|---|
SelfAttention | Integrated- here for one input sequence. Depending on the implementation it is also called a dot product attention or global attention. |
CrossAttention (aka Co-Attention) | Not Integrated so far - here for multiple input sequences |
CausalSelfAttention | Not Integrated so far - here for parts of one or multiple input sequences e.g., only all token before the present. Depending on the implementation also called local attention. |
MultiHeadAttention | Not Integrated so far - here for multiple concerns/ questions |
MultiQueryAttention | Not Integrated so far - here for multiple concerns/ questions but knowing the other concerns/ questions |
GroupQueryAttention | Not Integrated so far - here for building logical groups between the questions |
Terms:
- Heads: amount on parallel questions on a given stream.
- Contexts: amount of parallel streams.
- Temporal: Time.
- Spatial: Dimensionality.
Note: All attention should be available for multiple dimensions. This includes spatial transformer which acts in >= 2D space (=spatial) as required for CNN applications.
More complex models mappes as own layer: