Hi, I’m working with my original data and struggling with creating a dataset on torchtext.
The data consist of sentences.
i.e.) [[“He”, “plays”, “piano”],[“He”, “plays”, “guitar”, “well”],…]
I would like to choose one of them for train.
So I set the Field for them as followings.
n_samples_field = Field(use_vocab=True,
eos_token=SpecialToken.EOS.value,
pad_token=SpecialToken.Padding.value,
unk_token=SpecialToken.Unknown.value,
preprocessing=lambda sen: random.choice(sen) \
if sen != [] else [""],
include_lengths=True)
But it failed and there is the error code.
I thought I need to convert each sen into int after preprocessing, but how?
File "/home/ubuntu/test/train.py", line 132, in run
for batch in X:
File "/home/ubuntu/anaconda3/envs/test_env/lib/python3.7/site-packages/torchtext/data/iterator.py", line 156, in __iter__
yield Batch(minibatch, self.dataset, self.device)
File "/home/ubuntu/anaconda3/envs/test_env/lib/python3.7/site-packages/torchtext/data/batch.py", line 34, in __init__
setattr(self, name, field.process(batch, device=device))
File "/home/ubuntu/anaconda3/envs/test_env/lib/python3.7/site-packages/torchtext/data/field.py", line 237, in process
tensor = self.numericalize(padded, device=device)
File "/home/ubuntu/anaconda3/envs/test_env/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in numericalize
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/home/ubuntu/anaconda3/envs/test_env/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in <listcomp>
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
File "/home/ubuntu/anaconda3/envs/test_env/lib/python3.7/site-packages/torchtext/data/field.py", line 336, in <listcomp>
arr = [[self.vocab.stoi[x] for x in ex] for ex in arr]
AttributeError: 'Field' object has no attribute 'vocab'