How to use variable-sized input to Dataloader

Sine · December 18, 2020, 5:07pm

My input is variable-sized, such as torch.Size([4, 2, 100]), torch.Size([4, 2, 100]), torch.Size([5, 2, 100]), torch.Size([6, 2, 100]), torch.Size([6, 2, 100])

I want the same size in each batch, for example torch.Size([4, 2, 100]), torch.Size([4, 2, 100]) are in the same batch.

My original method is as follow:

Sorting the input by the first dimension size.
Dividing into a list of tuple with batch size, like [ (Size([4, 2, 100]),Size([4, 2, 100])), (Size([5, 2, 100]), Size([5, 2, 100]))]. Batch size is 2.
Shffuling the index of the list.

Although my method can work, I want to speed up the program by using Dataloader.

Does the input with the smaller original size (such as Size([1, 2, 100])) have any effect during training when I pad all the inputs into the same size (such as Size([10, 2, 100]))?

Thank you.

InnovArul · December 19, 2020, 2:41am

Can you elaborate a bit more on the usecase?

Yes, you can always pad the input to a standard size to get it working with dataloader.
Even otherwise, you can write a custom collate function to handle different-sized data instances.
I am not sure of your usecase though

Sine · December 19, 2020, 2:47am

My input is the multi-turned dialogue, and its size is [context length, utterance length, embedding size].
However, the context length is variable-sized.

InnovArul · December 19, 2020, 2:54am

A simple way would be to use a custom collate function and collect the variable-sized data to make a batch.
There are other advanced solutions with padded_sequences, as well.

Sine · December 19, 2020, 3:00am

Thank you.

Does padded sequence interfere with training?
I use Transformer model.
For example,
[[[1,2,3,4,0,0,0],[2,3,5,0,0,0,0],[0,0,0,0,0,0,0]],[[3,2,3,4,0,0,0],[0,0,0,0,0,0,0],[0,0,0,0,0,0,0]]]

InnovArul · December 19, 2020, 3:10am

It would increase the computation. Apart from that, you may have to assign the logits of the padded entries to a higher negative value, so that they don’t contribute to attention probability.