fast binary search: add links to code and resources

RagnarGrootKoerkamp · Sep 19, 2024 · 97d6cbd · 97d6cbd
1 parent 568f160
commit 97d6cbd
Showing 1 changed file with 34 additions and 1 deletion.
diff --git a/posts/fast-binary-search.org b/posts/fast-binary-search.org
@@ -5,7 +5,33 @@
 #+toc: headlines 3
 #+date: <2024-09-08 Sun>
 
-** TODO Memory efficiency
+* High level ideas
+- Prefix table: for each 20-bit prefix, store the corresponding range of the array.
+- Interpolation: Make one or more interpolation steps. Could store max resulting error.
+  - Drawback: can cause an unpredictable number of resulting iterations.
+- Batching: process multiple (8-32) queries at the same time, hiding memory latency
+- Query bucketing: given >>1M of queries, partition them into 1M buckets and
+  answer bucket by bucket.
+- Eytzinger layout
+- B-tree layout
+- prefetching (either next Eytzinger iteration, or in the batch)
+
+** Resources
+- Algorithmica: https://en.algorithmica.org/hpc/data-structures/
+- [cite/t:@khuong-array-layouts]
+- https://www.cai.sk/ojs/index.php/cai/article/view/2019_3_555
+
+** Code
+- [[https://github.com/RagnarGrootKoerkamp/suffix-array-searching][github:RagnarGrootKoerkamp/suffix-array-searching]]
+  - Some initial binary search and Btree variants.
+- [[https://github.com/RagnarGrootKoerkamp/cpu-benchmarks][github:RagnarGrootKoerkamp/cpu-benchmarks]]
+  - low-level CPU benchmarks to get upper bounds on potential performance
+
+* To measure
+- Max random access cacheline throughput (1 and many threads)
+  - Also variants for fetching 2/3/4 consecutive cachelines.
+
+* TODO Memory efficiency
 
 Suppose our task is to find an integer $q$ in a sorted list $A$ of length $n$.
 One option is to use binary search, but using a B-tree or the Eytzinger layout
@@ -17,6 +43,13 @@ layout.
 That is: which method puts the least pressure on the memory system, and can thus
 get higher potential throughput
 
+Let's say we are searching an array consisting of $n=2^k$ $b$ byte elements.
+** B-tree
+A cache-line has 64 bytes. Set $B = 64/b$. Each node stores $B-1$ values and a
+gap, and corresponds to $\ell=\lg_2(B)$ levels of the tree.
+$\ell$ values of each cache-line are used.
+
+
 
 
 #+print_bibliography: