Hi guys, I am facing a problem using the torchtext package. So, in the data building phase, I created a text field using the data.Field and I build the vocabulary using training data:
shared_text_field = data.Field(sequential=True, tokenize=self.tokenizer.tokenize,
Unfortunately, when I tried to use it with my test data, I got a KeyError problem (I’m sorry for the truncated error message):
File "/Users/aryopg/.local/share/virtualenvs/learning-Y_vf_ZaD/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in <listcomp>
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
Did I do something wrong or is this at the moment not supported? (A lil bit bizarre if it’s not supported yet) I’ll be very happy to provide more details. Thanks
There is a similar issue here but with a whitespace being looked up: Out-of-vocabulary KeyError on vocab.stoi in TorchText
Even if the comment there is correct – that a whitespace shouldn’t really be part of your vocabulary – shouldn’t it be mapped to the unknown token by default? Unfortunately, they’re not really answering the question about oov words.
Issue without much discussion: https://github.com/pytorch/text/issues/337
… Maybe you can revive the issue or create a new one.
Thanks for the response and for pointing out those links! I am quite aware of the unresponsiveness, unfortunately
This may not be the place to ask, but is there any other library (other than torchtext) that is more robust and well-maintained? Ps: Hopefully this will ring a little bell for the developers lol
There are bugs in torchtext, try what you did after installing it straight from the Github repository like this pip install --upgrade git+https://github.com/pytorch/text
Are there any answers to the OOV issue so far ?