Mini batch training for inputs of variable sizes

I have a list of LongTensors, and another list of labels. I’m new to PyTorch and RNN’s so I’m quite confused as to how to implement minibatch training for the data I have. There is much more to this data, but I want to keep it simple, so I can understand only how to implement the minibatch training part. I’m doing multiclass classification based on the final hidden state of an LSTM/GRU trained on variable length inputs. I managed to get it working with batch size 1(basically SGD) but I’m struggling with implementing minibatches.

Do I have to pad the sequences to the maximum size and create a new tensor matrix of larger size which holds all the elements? I mean like this:

inputs = pad(sequences)
train = DataLoader(train, batch_size=batch_size, shuffle=True)
for i, data in train:
   #do stuff using LSTM and/or GRU models

Is this the accepted way of doing minibatch training on custom data? I couldn’t find any tutorials on loading custom data using DataLoader(but I assume that’s the way to create batches using pyTorch?)

Another doubt I have is with regards to padding. The reason I’m using LSTM/GRU is because of the variable length of the input. Doesn’t padding defeat the purpose? My question is basically is padding necessary for minibatch training?

Yes, you will have to pad your input sequences to implement minibatch training.

Essentially, the way minibatch works is to pack a bunch of input tensors into another tensor of a higher dimension for computational efficiency. As an example, three separate input tensors of size 10 can be stacked together as a minibatch into another tensor of size 3 x 10 Since tensors of different lengths cannot be stacked together, you need to pad all input tensors to be of the same length. Notice that this a requirement for minibatch technique, and not of the RNN per se - so this doesn’t defeat the purpose.

Also notice that the sequences have to be of the same length across a particular mini-batch, not across the whole dataset. So let’s say if you have 10 input examples of the following sizes [10, 10, 59, 60, 28, 30, 97, 100, 3, 5] , you can split them into 5 batches of different sizes [10, 60, 30, 100, 5]. I guess you should be more convinced now.

For the implementation, see and the related discussion at .

Explained in detail here:

I have done all the steps you’ve outlined, except possibly for the masking. I think I have tried masking the output, but perhaps it wasn’t very good. I’ll try again. Thank you for a really well written tutorial! This ought to be baked into the standard functions of pyTorch!