Skip to content

Commit

Permalink
Bump v2.6.0
Browse files Browse the repository at this point in the history
  • Loading branch information
tridao committed Jul 11, 2024
1 parent d0787ac commit da11d1b
Show file tree
Hide file tree
Showing 3 changed files with 8 additions and 3 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,11 @@ Implement deterministic backward pass. Thanks to engineers from [Meituan](www.me
Support paged KV cache (i.e., [PagedAttention](https://arxiv.org/abs/2309.06180)).
Thanks to @beginlner for this contribution.

### 2.6: Softcapping.

Support attention with softcapping, as used in Gemma-2 and Grok models.
Thanks to @Narsil for this contribution.

## Performance

We present expected speedup (combined forward + backward pass) and memory savings from using FlashAttention against PyTorch standard attention, depending on sequence length, on different GPUs (speedup depends on memory bandwidth - we see more speedup on slower GPU memory).
Expand Down
2 changes: 1 addition & 1 deletion flash_attn/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
__version__ = "2.5.9.post1"
__version__ = "2.6.0"

from flash_attn.flash_attn_interface import (
flash_attn_func,
Expand Down
4 changes: 2 additions & 2 deletions training/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ RUN pip install transformers==4.25.1 datasets==2.8.0 pytorch-lightning==1.8.6 tr
RUN pip install git+https://github.com/mlcommons/[email protected]

# Install FlashAttention
RUN pip install flash-attn==2.5.9.post1
RUN pip install flash-attn==2.6.0

# Install CUDA extensions for fused dense
RUN pip install git+https://github.com/HazyResearch/flash-attention@v2.5.9.post1#subdirectory=csrc/fused_dense_lib
RUN pip install git+https://github.com/HazyResearch/flash-attention@v2.6.0#subdirectory=csrc/fused_dense_lib

0 comments on commit da11d1b

Please sign in to comment.