How to use variable-sized input to Dataloader

My input is variable-sized, such as torch.Size([4, 2, 100]), torch.Size([4, 2, 100]), torch.Size([5, 2, 100]), torch.Size([6, 2, 100]), torch.Size([6, 2, 100])

I want the same size in each batch, for example torch.Size([4, 2, 100]), torch.Size([4, 2, 100]) are in the same batch.

My original method is as follow:

  1. Sorting the input by the first dimension size.
  2. Dividing into a list of tuple with batch size, like [ (Size([4, 2, 100]),Size([4, 2, 100])), (Size([5, 2, 100]), Size([5, 2, 100]))]. Batch size is 2.
  3. Shffuling the index of the list.

Although my method can work, I want to speed up the program by using Dataloader.

Does the input with the smaller original size (such as Size([1, 2, 100])) have any effect during training when I pad all the inputs into the same size (such as Size([10, 2, 100]))?

Thank you.

Can you elaborate a bit more on the usecase?

Yes, you can always pad the input to a standard size to get it working with dataloader.
Even otherwise, you can write a custom collate function to handle different-sized data instances.
I am not sure of your usecase though

My input is the multi-turned dialogue, and its size is [context length, utterance length, embedding size].
However, the context length is variable-sized.

A simple way would be to use a custom collate function and collect the variable-sized data to make a batch.
There are other advanced solutions with padded_sequences, as well.

Thank you.

Does padded sequence interfere with training?
I use Transformer model.
For example,

It would increase the computation. Apart from that, you may have to assign the logits of the padded entries to a higher negative value, so that they don’t contribute to attention probability.

1 Like