Skip to content

Commit

Permalink
Apply suggestions from code review
Browse files Browse the repository at this point in the history
Co-authored-by: Micka <[email protected]>
  • Loading branch information
cjnolet and lowener authored Oct 3, 2024
1 parent 82ec71b commit a84978a
Show file tree
Hide file tree
Showing 4 changed files with 5 additions and 5 deletions.
2 changes: 1 addition & 1 deletion docs/source/choosing_and_configuring_indexes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Primer on vector search indexes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Vector search indexes often use approximations to trade-off accuracy of the results for speed, either through lowering latency (end-to-end single query speed) or by increating throughput (the number of query vectors that can be satisfied in a short period of time). Vector search indexes, especially ones that use approximations, are very closely related to machine learning models but they are optimized for fast search and accuracy of results.
Vector search indexes often use approximations to trade-off accuracy of the results for speed, either through lowering latency (end-to-end single query speed) or by increasing throughput (the number of query vectors that can be satisfied in a short period of time). Vector search indexes, especially ones that use approximations, are very closely related to machine learning models but they are optimized for fast search and accuracy of results.

When the number of vectors is very small, such as less than 100 thousand vectors, it could be fast enough to use a brute-force (also known as a flat index), which returns exact results but at the expense of exhaustively searching all possible neighbors

Expand Down
2 changes: 1 addition & 1 deletion docs/source/comparing_indexes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ We suggest averaging performance within a range of recall. For general guidance,
.. image:: images/recall_buckets.png


This allows us to say things like “okay at 95% recall level, model A can be built 3x faster than model B, but model B has 2x lower latency than model A”
This allows us to make observations such as “at 95% recall level, model A can be built 3x faster than model B, but model B has 2x lower latency than model A”

.. image:: images/build_benchmarks.png

Expand Down
2 changes: 1 addition & 1 deletion docs/source/cuvs_bench/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ The usage of the script `cuvs_bench.run` is:
--build
--search
--algorithms ALGORITHMS
run only comma separated list of named algorithms. If parameters `groups` and `algo-groups are both undefined, then group `base` is run by default (default: None)
run only comma separated list of named algorithms. If parameters `groups` and `algo-groups` are both undefined, then group `base` is run by default (default: None)
--groups GROUPS run only comma separated groups of parameters (default: base)
--algo-groups ALGO_GROUPS
add comma separated <algorithm>.<group> to run. Example usage: "--algo-groups=raft_cagra.large,hnswlib.large" (default: None)
Expand Down
4 changes: 2 additions & 2 deletions docs/source/indexes/ivfflat.rst
Original file line number Diff line number Diff line change
Expand Up @@ -77,8 +77,8 @@ assumption that the number of lists, and thus the max size of the data in the in
might not matter. For example, most vector databases build many smaller physical approximate nearest neighbors indexes, each from
fixed-size or maximum-sized immutable segments and so the number of lists can be tuned based on the number of vectors in the indexes.

Empirically, we've found :math:`\sqrt{n_index_vectors}` to be a good starting point for the :math:`n_lists` hyper-parameter. Remember, having more
lists means less points to search within each list, but it could also mean more :math:`n_probes` are needed at search time to reach an acceptable
Empirically, we've found :math:`\sqrt{n\_index\_vectors}` to be a good starting point for the :math:`n\_lists` hyper-parameter. Remember, having more
lists means less points to search within each list, but it could also mean more :math:`n\_probes` are needed at search time to reach an acceptable
recall.


Expand Down

0 comments on commit a84978a

Please sign in to comment.