n-gram width
+https://en.wikipedia.org/wiki/Sentinel_value|sentinel +character – added to term on add and has to +ensure that whole terms on index are checked.
+normalises search terms, defaults to static Index.normalise
+Does the index has term?
+RangeError if term is shorter than n
+Normalise a term
+Calls the normalisation function given on constructor, +and checks that term length is at least index length.
+RangeError if term is shorter than n
+Static
fromCreate index from terms
+Terms can be an array of terms or an object mapping ids to terms
+Rest
...args: [number?, string?, NormaliseFunction?]Static
normaliseStatic
updateGet ngrams of length n from input +@see: https://en.wikipedia.org/wiki/N-gram|N-gram at Wikipedia
+<span> +<a href="https://github.com/peterhil/ngrammy/actions/workflows/main.yml"> +<img alt="ci status" src="https://github.com/peterhil/ngrammy/workflows/CI/badge.svg"></a> +<a href="https://github.com/peterhil/ngrammy/actions/workflows/docs.yml"> +<img alt="docs status" src="https://github.com/peterhil/ngrammy/workflows/Docs/badge.svg"></a> +<a href="https://github.com/peterhil/ngrammy/actions/workflows/size.yml"> +<img alt="size status" src="https://github.com/peterhil/ngrammy/workflows/size/badge.svg"></a> +<a href="https://codeclimate.com/github/peterhil/ngrammy/maintainability"> +<img alt="maintainability" src="https://api.codeclimate.com/v1/badges/46e067100c6bce035c84/maintainability"></a> +<a href="https://codeclimate.com/github/peterhil/ngrammy/test_coverage"> +<img alt="code coverage" src="https://api.codeclimate.com/v1/badges/46e067100c6bce035c84/test_coverage"></a> +</span>
+Ngrammy is an Unicode capable n-gram based search index library +for writing custom autocompletions. It is a small (< 10kb) +Typescript library with full test coverage and +Rambdax as the only +dependency.
+See library documentation and especially:
+ +I wrote this library for making a fast category search with +autocomplete for my browser extension called +Spellbook so here is a +related example:
+import ngrammy from 'ngrammy'
import { flatten, pick, pipe, values } from 'rambda'
import { writable } from 'svelte/store'
import { flattenTree } from '../api/categories'
import { isCategory } from '../api/helpers'
const allCategories = writable({})
// 1. Create an index of bigrams with newline as the sentinel
//
// The default normalisation function (3rd parameter) will collapse
// all whitespace into single space characters, so newline is a
// safe (and default) choice for sentinel.
const index = new ngrammy.Index(2, '\n')
function prepareIndex () {
if (index.size() > 0) {
console.debug('index exists already')
} else {
console.debug('preparing index')
browser.bookmarks.getTree().then((bookmarks) => {
const filterCategories = pipe(flattenTree, filter(isCategory))
const categories = filterCategories(bookmarks)
// 2. Add terms to index
for (category of categories) {
index.add(category.title, category.id)
allCategories[category.id] = category
}
})
}
}
function categorySearch (query) {
// 3. Search the index (index.locations would also return positions)
const ids = index.search(query)
const result = pick(ids, allCategories) // allCategories is an object
const sorted = sortByTitleCaseInsensitive(values(result))
return sorted
}
+
+
+See search tests for more examples. Especially +tests for search and locations are instructive.
+Install Ngrammy with:
+pnpm install ngrammy
+
+
+There are various other scripts for development:
+pnpm dev # watch sources
pnpm build # build project
pnpm test # run tests with tap
pnpm coverage -- --browser # generate code coverage report
pnpm doc # generate documentation
pnpm lint # run eslint
pnpm analyze # run size-limit --why
pnpm size # run size-limit
+
+
+Many libraries for ngrams only support Basic Latin (ASCII) character +set — Ngrammy on the other hand:
+Ngrammy supports all Unicode whitespace characters when doing
+normalisation, including EBCDIC New Line which gets mapped to
+Unicode as \x0085
(NEL), and has caused considerable trouble with
+XML parsing.
See Index class constructor documentation.
+Possible use cases for customisations:
+Get ngrams of length n from input +@see: https://en.wikipedia.org/wiki/N-gram|N-gram at Wikipedia
+
Unicode capable https://en.wikipedia.org/wiki/N-gram|N-gram +search index for writing custom https://en.wikipedia.org/wiki/Autocomplete|autocompletions
+