Skip to content

Commit

Permalink
fast binary search: add links to code and resources
Browse files Browse the repository at this point in the history
  • Loading branch information
RagnarGrootKoerkamp committed Sep 19, 2024
1 parent 568f160 commit 97d6cbd
Showing 1 changed file with 34 additions and 1 deletion.
35 changes: 34 additions & 1 deletion posts/fast-binary-search.org
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,33 @@
#+toc: headlines 3
#+date: <2024-09-08 Sun>

** TODO Memory efficiency
* High level ideas
- Prefix table: for each 20-bit prefix, store the corresponding range of the array.
- Interpolation: Make one or more interpolation steps. Could store max resulting error.
- Drawback: can cause an unpredictable number of resulting iterations.
- Batching: process multiple (8-32) queries at the same time, hiding memory latency
- Query bucketing: given >>1M of queries, partition them into 1M buckets and
answer bucket by bucket.
- Eytzinger layout
- B-tree layout
- prefetching (either next Eytzinger iteration, or in the batch)

** Resources
- Algorithmica: https://en.algorithmica.org/hpc/data-structures/
- [cite/t:@khuong-array-layouts]
- https://www.cai.sk/ojs/index.php/cai/article/view/2019_3_555

** Code
- [[https://github.com/RagnarGrootKoerkamp/suffix-array-searching][github:RagnarGrootKoerkamp/suffix-array-searching]]
- Some initial binary search and Btree variants.
- [[https://github.com/RagnarGrootKoerkamp/cpu-benchmarks][github:RagnarGrootKoerkamp/cpu-benchmarks]]
- low-level CPU benchmarks to get upper bounds on potential performance

* To measure
- Max random access cacheline throughput (1 and many threads)
- Also variants for fetching 2/3/4 consecutive cachelines.

* TODO Memory efficiency

Suppose our task is to find an integer $q$ in a sorted list $A$ of length $n$.
One option is to use binary search, but using a B-tree or the Eytzinger layout
Expand All @@ -17,6 +43,13 @@ layout.
That is: which method puts the least pressure on the memory system, and can thus
get higher potential throughput

Let's say we are searching an array consisting of $n=2^k$ $b$ byte elements.
** B-tree
A cache-line has 64 bytes. Set $B = 64/b$. Each node stores $B-1$ values and a
gap, and corresponds to $\ell=\lg_2(B)$ levels of the tree.
$\ell$ values of each cache-line are used.




#+print_bibliography:

0 comments on commit 97d6cbd

Please sign in to comment.