Add paged attention optimizations #30

jorgeantonio21 · 2024-04-08T10:41:19Z

In order to have a fast service for inference, we need to implement a few optimisations on top of Candle. This will include an implementation of paged attention (see here for an existing implementation using Candle already).

jorgeantonio21 · 2024-04-08T10:42:01Z

We should also optimise our stack using cache mechanisms for inference service, see here for an existing implementation using Candle.

jorgeantonio21 added the AI label Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add paged attention optimizations #30

Add paged attention optimizations #30

jorgeantonio21 commented Apr 8, 2024

jorgeantonio21 commented Apr 8, 2024

Add paged attention optimizations #30

Add paged attention optimizations #30

Comments

jorgeantonio21 commented Apr 8, 2024

jorgeantonio21 commented Apr 8, 2024