Please consider padding the input to a length of

Ernst_Oberortner · June 16, 2020, 1:13am

Hi, I’m using a BucketIterator with a batch size of 64 to iterate over my training and test dataset.

train_iterator, test_iterator = BucketIterator.splits(
(train_data, test_data),
batch_size=64,
repeat=False,
sort=True, sort_key=lambda x: len(x.prot_seq),
device=device)

The datasets are not a multiple of 64 though.
When training, I’m getting the following error message:

If training, sequence Length 35 has to be a multiple of least common multiple chunk_length 64. Please consider padding the input to a length of 64

Shouldn’t the BucketIterator take care of always giving batches of 64 sequences?
Or is this somehow configurable? The docs are not so helpful in this regard.

Btw. I’m using a Reformer model

from transformers import ReformerConfig, ReformerModelWithLMHead

Thanks for your help in advance!