Thanks!
Now I fixed my code snippet into
import torch
from torchtext.data import Dataset, Example, Field
from torchtext.data import Iterator, BucketIterator
TEXT = Field(sequential=True, tokenize=lambda x: x.split(),
lower=True, use_vocab=True)
LABEL = Field(sequential=False, use_vocab=False)
data = [("shop street mountain is hight", "a"),
("work is interesting", "b")]
FIELDS = [('text', TEXT), ('category', LABEL)]
examples = list(map(lambda x: Example.fromlist(list(x), fields=FIELDS),
data))
dt = Dataset(examples, fields=FIELDS)
TEXT.build_vocab(dt, vectors="glove.6B.100d")
LABEL.build_vocab(dt, vectors="glove.6B.100d")
print(TEXT.vocab.stoi["is"])
data_iter = Iterator(dt, batch_size=4, sort_key=lambda x: len(x))
But now I have the next question )
How do I transform text data in dt
or data_iter
into numerical format suitable to be fed in into the model?
Now I have iterator over ‘dt’, but it contains text field as a text, not as numerical torch tensors.
As I understand TEXT field now contains mappings to tensors, but I need to use dt or data_iter as input to the model.
Update: reformulated into the question: Creating input for the model from the raw text