Using variable sized input - Is padding required?

My input has variable size. I haven’t found a way to use a DatasetLoader without padding the inputs with the maximum size in the batch. Is there any way around it? Is it possible without using the DatasetLoader class?

1 Like

A batch must always consist of elements of the same size. However if your input is large enough and you can handle the corresponding output sizes you can feed batches off different sizes to the model.

1 Like

How can I vary the batch size?

This does not work with the default dataloader but in general you could handle the loading by yourself and simply add a batch dimension to your data and use to stack them to a batch:

batch_elements = []

for i in range(curr_batch_size):
    # generate some sample data
    tmp = torch.rand((1, 50, 50))
    # add batch dimension
    tmp = tmp.unsqueeze(0)
batch =, 0)

Replace tmp = torch.rand((1, 50, 50)) by your own data samples. In This case I used 50x50 pixel images with one channel as sample data. To show the workflow with general data I did not integrate the batch dimension into the shape of the random tensor but added it afterwards.

EDIT: Alternatively you could use something like this. But note that this will pad your input and (depending on the maximal difference of your input sizes) you could end up with padding an enormous amount of zeros (or constants or whatever).

1 Like

How exactly are you doing this? Your answer seems to assume everything is of same size already.

The original question is if padding is required for variable size input. Let’s be direct and answer that directly. Is that always necessary or not? When is padding necessary and when is it not?

To my understanding it’s always required cuz batches have to be of the same size (unless I’m not understanding something or don’t know something).

Items in the same batch have to be the same size, yes, but having a fully convolutional network you can pass batches of different sizes, so no, padding is not always required. In the extreme case you could even use batchsize of 1 and your input size could be completely random (assuming, that you adjusted strides, kernelsize, dilation etc in a proper way).

This is why it is hard to answer this question in general.

1 Like

My take on how to solve this issue:

def collate_fn_padd(batch):
    Padds batch of variable length

    note: it converts things ToTensor manually here since the ToTensor transform
    assume it takes in images rather than arbitrary tensors.
    ## get sequence lengths
    lengths = torch.tensor([ t.shape[0] for t in batch ]).to(device)
    ## padd
    batch = [ torch.Tensor(t).to(device) for t in batch ]
    batch = torch.nn.utils.rnn.pad_sequence(batch)
    ## compute mask
    mask = (batch != 0).to(device)
    return batch, lengths, mask

There seems to be a large collection of posts all over pytorch that makes it difficult to solve this issue. I have collected a list of all of them hopefully making things easier for all of us. Here:


Also, Stack-overflow has a version of this question too: