[FR] RAG: Add support for Int8 embeddings #118

svilupp · 2024-04-03T08:24:28Z

It would be great to have support for embeddings compressed to Int8 as per HuggingFace: Embedding Quantization.

Potential implementation would be to:

define an embedder (<:AbstractEmbedder for get_embeddings) and the corresponding finder (<:AbstractSimilarityFinder for find_similar)
Both would have the vectors with necessary min_values and max_values fields to hold the effective range for each embedding dimension (eg, length(min_values)=length(max_values)=D)
define methods for these types
The conversion to Int8 could be done post hoc (after build_index) via a utility function and then the resulting finder with the range to allow converting to Int8 (to be provided to the airag)
It should implement the two-stage pass with rescore_multiplier=4 (first on Int8 embeddings, then with Float x Int8)

The text was updated successfully, but these errors were encountered:

svilupp added the RAG label Apr 10, 2024

Provide feedback