Skip to content

Commit

Permalink
Add FM-index implementations post
Browse files Browse the repository at this point in the history
  • Loading branch information
RagnarGrootKoerkamp committed Oct 2, 2024
1 parent f2bc51e commit e417905
Showing 1 changed file with 25 additions and 0 deletions.
25 changes: 25 additions & 0 deletions posts/fm-index-implementations.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#+title: FM-index implementations
#+filetags: @survey
#+OPTIONS: ^:{} num: num:t
#+hugo_front_matter_key_replace: author>authors
#+toc: headlines 3
#+date: <2024-10-02 Wed>

Here I'll briefly list some FM-index and related implementations around the web.
Implementations seem relatively inconsistent, mostly because the FM-index is
more of a 'wrapper' type around a given Burrows-Wheeler-transform and an
/occurrences/ list. Both can be implemented in various ways. In particular
occurrences should be stored using a wavelet tree for optimal compressing.

- The [[https://github.com/wafflespeanut/nucleic-acid/blob/2adbf5181081245423f974a88b5ccf53d7bf26ac/src/bwt.rs#L96][nucleic-acid repo]] contains a completely unoptimised version.
- The Rust-bio crate contains a [[https://github.com/rust-bio/rust-bio/blob/master/src/data_structures/fmindex.rs#L209][generic FM-index]]. It stores a [[https://github.com/rust-bio/rust-bio/blob/master/src/data_structures/bwt.rs#L75-L94][sampled
occurrences array]], so that space is relatively small but lookups take $O(k)$
time for sampling factor $k$.
- SDSL contains a [[https://github.com/simongog/sdsl-lite/blob/c32874cb2d8524119f25f3b501526fe692df29f4/include/sdsl/wavelet_][wavelet tree]] and [[https://github.com/simongog/sdsl-lite/blob/master/include/sdsl/csa_wt.hpp#L48][compressed suffix array]] implementation based
on it, that provides the same functionality as an FM-index.
- There is the [[https://github.com/rossanoventurini/qwt][Quad Wavelet Tree]] (QWT) Rust crate [cite:@qwt]. This uses a 4-ary
tree instead of the usual binary wavelet tree, and improves latency by around
a factor 2 over SDSL wavelet trees.
- Dominik Kempa has the [[https://github.com/dominikkempa/faster-minuter?tab=readme-ov-file][Faster-Minuter index]] [cite:@fasterminuter] that contains
an improved wavelet tree as well.
- [[https://github.com/achacond/gem-cutter][GEM-Cutter]] contain a GPU implementation of the FM-index [cite:@gemcutter].

0 comments on commit e417905

Please sign in to comment.