minimizer review comments

RagnarGrootKoerkamp · Oct 15, 2024 · 1a8e32c · 1a8e32c
1 parent 0cd04a5
commit 1a8e32c
Show file tree

Hide file tree

Showing 2 changed files with 97 additions and 0 deletions.
diff --git a/posts/minimizer-review-comments.org b/posts/minimizer-review-comments.org
@@ -0,0 +1,64 @@
+#+title: Comments on 'When Less is More' minimizer review
+#+filetags: @paper-review minimizers
+#+OPTIONS: ^:{} num: num:
+#+hugo_front_matter_key_replace: author>authors
+#+toc: headlines 3
+#+date: <2024-10-15 Tue>
+
+These are some (biased) comments on [cite/title/b:@minimizer-review-2] [cite:@minimizer-review-2].
+
+* The importance of ordering
+
+#+begin_quote
+the interest lies in constructing a minimizer with a density within a constant
+factor, i.e., $O(1/w)$ for any $k$.  With lexicographic ordering, minimizers can
+achieve such density, but with large $k$ values ($\geq \log_{|Σ|}(w)-c$ for a
+constant $c$), which might not be desirable [cite:@miniception]. However, random
+ordering can result in a lower density than that of the lexicographic ordering.
+Thus,  random ordering (implemented with pseudo-random hash functions) is
+usually used in practice.
+#+end_quote
+- I typically consider $k = \log_\sigma w$ to be small. Really, only for very
+  small $k$ up to say $4$, random minimizers do /not/ have density $O(1/w)$. So
+  in general, reaching $O(1/w)$ is easy unless $k$ is very small.
+- As shown in Theorem 2 of [cite/t:@miniception], lexicographic minimizers are
+  optimal, in that they have density $O(1/w)$ if and only if this is possible at all.
+  Some motivation why random is in fact better in practice would be good.
+
+#+begin_quote
+Recent investigations indicate that ordering algorithms can achieve a density value of
+$1.8/(w + 1)$ [cite:@docks-wabi], well below the originally proposed lower bound of $2/(w + 1)$ [cite:@sketching-and-sublinear-datastructures;@minimizers].
+#+end_quote
+- I cannot find the $1.8/(w+1)$ in either [cite/t:@docks-wabi] or [cite/t:@docks].
+- For which $k$? For $k=1$, this is impossible. For $k>w$, miniception [cite:@miniception] is
+  better at $1.67/w$, and in fact, mod-minimizer [cite:@modmini] is even better and
+  asymptotically reaches density $1/w$, so this $1.8/(w+1)$ is quite meaningless anyway.
+- A remark that the original lower bound doesn't apply because of overly strong
+  assumptions would be in place here. Otherwise the sentence kinda contradicts itself.
+
+
+* Asymptotically optimal minimizers
+
+#+begin_quote
+This dual-minimizer setup has been shown to achieve
+an upper bound expected density of $1.67/(w + 1)$, which is lower than the $2/(w + 1)$
+density of traditional random minimizers.
+#+end_quote
+- Again, only for $k>w$.
+
+#+begin_quote
+the lower
+bound of the resulting sketch ($1.67/(w + 1)$) is higher than the theoretical lower bound
+($1/w$), which can be achieved using UHS or Polar Sets.
+#+end_quote
+- should say /upper bound/ instead.
+- This paragraph is titled /asymptotically optimal minimizers/, yet you only
+  talk about miniception, which is not in fact asymptotically optimal.
+  UHS and Polar sets are also not really 'plain' minimizers.
+
+  Instead, [cite/t:@asymptotic-optimal-minimizers] present an actual asymptotic
+  optimal minimizer scheme based on universal hitting sets, and
+  [cite/t:@modmini] present an asymptotic optimal scheme with /much lower
+  density in practice/.
+
+#+print_bibliography:
diff --git a/references.bib b/references.bib
@@ -3871,3 +3871,36 @@ @Article{suffix-arrays-manber-myers-90
   url          = {http://dx.doi.org/10.1137/0222058},
   publisher    = {Society for Industrial & Applied Mathematics (SIAM)}
 }
+
+@Article{minimizer-review-2,
+  author       = {Ndiaye, Malick and Prieto-Baños, Silvia and Fitzgerald, Lucy
+                  M. and Yazdizadeh Kharrazi, Ali and Oreshkov, Sergey and
+                  Dessimoz, Christophe and Sedlazeck, Fritz J. and Glover,
+                  Natasha and Majidian, Sina},
+  title        = {When less is more: sketching with minimizers in genomics},
+  journal      = {Genome Biology},
+  year         = 2024,
+  volume       = 25,
+  number       = 1,
+  month        = oct,
+  issn         = {1474-760X},
+  doi          = {10.1186/s13059-024-03414-4},
+  url          = {http://dx.doi.org/10.1186/s13059-024-03414-4},
+  publisher    = {Springer Science and Business Media LLC}
+}
+
+@Article{sketching-and-sublinear-datastructures,
+  author       = {Marçais, Guillaume and Solomon, Brad and Patro, Rob and
+                  Kingsford, Carl},
+  title        = {Sketching and Sublinear Data Structures in Genomics},
+  journal      = {Annual Review of Biomedical Data Science},
+  year         = 2019,
+  volume       = 2,
+  number       = 1,
+  month        = jul,
+  pages        = {93–118},
+  issn         = {2574-3414},
+  doi          = {10.1146/annurev-biodatasci-072018-021156},
+  url          = {http://dx.doi.org/10.1146/annurev-biodatasci-072018-021156},
+  publisher    = {Annual Reviews}
+}