You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
After switching to pytorch_april_patched and installing -r requirements.txt
Producing dataset wiki...
encoding file testdata/wikiextracted/AA/wiki_01.txt ...
Traceback (most recent call last):
File "train.py", line 1036, in <module>
eval(f'test_{g.args.test}()')
File "<string>", line 1, in <module>
File "train.py", line 940, in test_checkpoint_wiki
data_setup()
File "train.py", line 333, in data_setup
g.corpus = get_lm_corpus(g.args.data, g.args.dataset, use_bpe=g.args.bpe)
File "/home/ubuntu/data_utils.py", line 381, in get_lm_corpus
corpus = Corpus(datadir, dataset, use_bpe, **kwargs)
File "/home/ubuntu/data_utils.py", line 309, in __init__
self.valid = self.vocab.encode_file(valid_path, ordered=True)
File "/home/ubuntu/utils/vocabulary.py", line 204, in encode_file
tokens: List[int] = self.tokenizer.encode(text) + [self.EOT]
File "/home/ubuntu/anaconda3/envs/pytorch_april_patched/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization_gpt2.py", line 261, in encode
return self.convert_tokens_to_ids(self.tokenize(text))
File "/home/ubuntu/anaconda3/envs/pytorch_april_patched/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization_gpt2.py", line 224, in tokenize
token = ''.join(self.byte_encoder[ord(b)] for b in token)
File "/home/ubuntu/anaconda3/envs/pytorch_april_patched/lib/python3.6/site-packages/pytorch_pretrained_bert/tokenization_gpt2.py", line 224, in <genexpr>
token = ''.join(self.byte_encoder[ord(b)] for b in token)
KeyError: 8212
The text was updated successfully, but these errors were encountered:
After switching to pytorch_april_patched and installing -r requirements.txt
The text was updated successfully, but these errors were encountered: