What is the batch text tuple composed of?

I am using the iterators generated by torchtext and am wondering about the batch.text tuple - the first element of the tuple seems to have a batch of word indices (as judging by the size, [batchsize*sentence_length] ) but the second element is a mystery to me - it is of length batchsize, but what are the values in there? Is this covered in the docs somewhere I cound’t find?

    for idx, batch in enumerate(val_iter):
        text = batch.text[0]    # this is the first element of the batch.text tuple - what is the second??

It is difficult to answer the question without much context. For example, a quick google search of your code snippet brings up this snippet of code

for b, batch in enumerate(train_iter):
            x, y = batch.text, batch.target

from where it’s easy to make out what is what. A similar context (source code or task) will help answer your question.

Here’s some more context - the iterator is coming from the torchtext bucketiterator . The label has its own field , batch.label so that is not it. The values I see are integers, as though a single numericalized sentence from the batch were being sent as the mystery second tuple element .

fields = {'text: ('text',TEXT), 'label': ('label',LABEL)}
train_data, test_data, validate_data = data.TabularDataset.splits(
        path=path,
        train=trainfile,
        test=testfile,
        validation=validationfile,
        format='csv',  #csv_reader_params=
        fields=fields
    )
train_iterator, test_iterator,valid_iterator = data.BucketIterator.splits(
        (train_data, test_data,validate_data),
        batch_size=BATCH_SIZE,
        sort_key=lambda x: len(x.text),
        device=device)
for idx, batch in enumerate(valid_iterator):
        text = batch.text[0]    # this is the first element of the batch.text tuple - what is the second??
        target = batch.label

Length of the sentence

1 Like

Thanks - is this referred to in the docs somewhere? I couldnt find much info on the iterators themselves.

Same here. In the past, I too struggled with a lack of good documentation about torchtext but my goto resources were the following and some deep digging :wink:

  1. http://anie.me/On-Torchtext/ (the answer to your question is in here)
  2. https://mlexplained.com/2018/02/08/a-comprehensive-tutorial-to-torchtext/