I am using the iterators generated by torchtext and am wondering about the batch.text tuple - the first element of the tuple seems to have a batch of word indices (as judging by the size, [batchsize*sentence_length] ) but the second element is a mystery to me - it is of length batchsize, but what are the values in there? Is this covered in the docs somewhere I cound’t find?
for idx, batch in enumerate(val_iter):
text = batch.text[0] # this is the first element of the batch.text tuple - what is the second??
It is difficult to answer the question without much context. For example, a quick google search of your code snippet brings up this snippet of code
for b, batch in enumerate(train_iter):
x, y = batch.text, batch.target
from where it’s easy to make out what is what. A similar context (source code or task) will help answer your question.
Here’s some more context - the iterator is coming from the torchtext bucketiterator . The label has its own field , batch.label so that is not it. The values I see are integers, as though a single numericalized sentence from the batch were being sent as the mystery second tuple element .
fields = {'text: ('text',TEXT), 'label': ('label',LABEL)}
train_data, test_data, validate_data = data.TabularDataset.splits(
path=path,
train=trainfile,
test=testfile,
validation=validationfile,
format='csv', #csv_reader_params=
fields=fields
)
train_iterator, test_iterator,valid_iterator = data.BucketIterator.splits(
(train_data, test_data,validate_data),
batch_size=BATCH_SIZE,
sort_key=lambda x: len(x.text),
device=device)
for idx, batch in enumerate(valid_iterator):
text = batch.text[0] # this is the first element of the batch.text tuple - what is the second??
target = batch.label
Thanks - is this referred to in the docs somewhere? I couldnt find much info on the iterators themselves.
Same here. In the past, I too struggled with a lack of good documentation about torchtext
but my goto resources were the following and some deep digging
-
http://anie.me/On-Torchtext/ (the answer to your question is in here)
- https://mlexplained.com/2018/02/08/a-comprehensive-tutorial-to-torchtext/