I’m trying to make my preprocessing life a bit more convenient by checking out torchtext but I’m a bit confused at how it works. Say I run this code:
train, val, test = torchtext.datasets.SST.splits( TEXT, LABEL, filter_pred=lambda ex: ex.label != 'neutral') train_iter, val_iter, test_iter = torchtext.data.BucketIterator.splits( (train, val, test), batch_size=10, device=-1)
How come when I look at the batches, for each batch my sentence length is the same (like one batch is 10x11, another is 10x7 etc…)? Is it somehow categorizing each batch by length of sentence? Is it padding some sentences (I don’t see the padding)?
Thanks for any help - beginner at this…