Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Breaking change in nltk 3.8.2 #2926

Open
yifanmai opened this issue Aug 14, 2024 · 0 comments
Open

Breaking change in nltk 3.8.2 #2926

yifanmai opened this issue Aug 14, 2024 · 0 comments

Comments

@yifanmai
Copy link
Collaborator

Upstream issue: nltk/nltk#3293

We can work around by pinning the version to 3.8.1.

This causes errors in the tests such as:

E       LookupError: 
E       **********************************************************************
E         Resource punkt_tab not found.
E         Please use the NLTK Downloader to obtain the resource:
E       
E         >>> import nltk
E         >>> nltk.download('punkt_tab')
E         
E         For more information see: https://www.nltk.org/data.html
E       
E         Attempted to load tokenizers/punkt_tab/english/
E       
E         Searched in:
E           - '/home/runner/nltk_data'
E           - '/opt/hostedtoolcache/Python/3.9.19/x64/nltk_data'
E           - '/opt/hostedtoolcache/Python/3.9.19/x64/share/nltk_data'
E           - '/opt/hostedtoolcache/Python/3.9.19/x64/lib/nltk_data'
E           - '/usr/share/nltk_data'
E           - '/usr/local/share/nltk_data'
E           - '/usr/lib/nltk_data'
E           - '/usr/local/lib/nltk_data'
E           - 'benchmark_output/perturbations/synonym'
E       **********************************************************************

Example stack trace:

src/helm/benchmark/metrics/test_bias_metrics.py:16: in check_test_cases
    bias_score = bias_func(test_case.texts)
src/helm/benchmark/metrics/bias_metrics.py:157: in evaluate_stereotypical_associations
    tokens = word_tokenize(text.lower())
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nltk/tokenize/__init__.py:129: in word_tokenize
    sentences = [text] if preserve_line else sent_tokenize(text, language)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nltk/tokenize/__init__.py:106: in sent_tokenize
    tokenizer = PunktTokenizer(language)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nltk/tokenize/punkt.py:1744: in __init__
    self.load_lang(lang)
/opt/hostedtoolcache/Python/3.9.19/x64/lib/python3.9/site-packages/nltk/tokenize/punkt.py:1749: in load_lang
    lang_dir = find(f"tokenizers/punkt_tab/{lang}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant