Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optionally add trimmer to search pipeline #154

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

dhdaines
Copy link
Contributor

@dhdaines dhdaines commented Jul 5, 2024

Fixes #151 but breaks bug-compatibility with lunr.js when an option is enabled (which also works, most of the time, in lunr.js with serialized models from lunr.py). The Javascript-side workaround is noted in olivernn/lunr.js#532 ... will lunr.js get updated? Magic 8-ball says "UNLIKELY"

@dhdaines dhdaines changed the title Add trimmer and stemmer to search (breaks bug-compatibility with lunr.js) Add trimmer and stopword filter to search (breaks bug-compatibility with lunr.js) Jul 6, 2024
@dhdaines
Copy link
Contributor Author

dhdaines commented Jul 6, 2024

Note that adding the stopword filter to search isn't really necessary since those terms just won't be in the index.

The trimmer on the other hand is really useful for the reason mentione above.

But again ... this breaks compatibility with lunr.js so you probably shouldn't merge it!

@dhdaines dhdaines changed the title Add trimmer and stopword filter to search (breaks bug-compatibility with lunr.js) Add trimmer to search (breaks bug-compatibility with lunr.js) Jul 6, 2024
@dhdaines
Copy link
Contributor Author

dhdaines commented Jul 6, 2024

Updated this because the stopword filter actually isn't useful in the search pipeline. But the trimmer is!

@dhdaines dhdaines changed the title Add trimmer to search (breaks bug-compatibility with lunr.js) Add trimmer to search Sep 9, 2024
@dhdaines
Copy link
Contributor Author

dhdaines commented Sep 9, 2024

Updated again - the behaviour is disabled by default, but can be enabled with the trimmer_in_search argument to get_default_builder. The resulting models should also work in lunr.js except in the case where multiple languages are used (which maybe doesn't work in lunr.js anyway?)

@codecov-commenter
Copy link

codecov-commenter commented Sep 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.10%. Comparing base (d07b60f) to head (9d9a7ff).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #154      +/-   ##
==========================================
+ Coverage   96.02%   96.10%   +0.07%     
==========================================
  Files          48       48              
  Lines        3171     3206      +35     
==========================================
+ Hits         3045     3081      +36     
+ Misses        126      125       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dhdaines dhdaines changed the title Add trimmer to search Optionally add trimmer to search pipeline Sep 9, 2024
@dhdaines
Copy link
Contributor Author

dhdaines commented Sep 9, 2024

The question then is whether you want to have the option also add the stopword filter in earch - as mentioned it doesn't actually do anything, because those terms just won't be matched. Also I'm not sure what happens with multi-language models in that case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Trimmer and stop word filter are missing from search pipelines
2 participants