How to create a dataloader with variable-size input

Hi, I’d like to create a dataloader with different size input images, but don’t know how to do that.

I’ve read the official tutorial on loading custum data (Writing Custom Datasets, DataLoaders and Transforms — PyTorch Tutorials 2.1.1+cu121 documentation), however in the tutorial, all the input images are rescaled to 256x256 and randomly cropped to 224*224.

I’d like to do another image classification task, and in my task I want to: keep image their original size, and the batch size should be greater than 1.

I noticed that there exists related questions/issues, such as: [feature request] Support tensors of different sizes as batch elements in DataLoader · Issue #1512 · pytorch/pytorch · GitHub

However, none of above gives exactly implementation details on how to create a variable-input size dataloader.

At the same time, I try to do modification on that dataloader tutorial’s IPython Notebook file. I defined my own dataset class and own ToTensor() function, and only do ToTensor() operation on transform parameter. In the Iterating through the dataset part, it can only show several batch of data, then crash like:


RuntimeError Traceback (most recent call last)
in ()
25 plt.title(‘Batch from dataloader’)
—> 27 for i_batch, sample_batched in enumerate(dataloader):
28 print(i_batch, sample_batched[‘image’].size(),
29 sample_batched[‘label’].size())

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in next(self)
199 self.reorder_dict[idx] = batch
200 continue
→ 201 return self._process_next_batch(batch)
203 next = next # Python 2 compatibility

/usr/local/lib/python2.7/dist-packages/torch/utils/data/dataloader.pyc in _process_next_batch(self, batch)
219 self._put_indices()
220 if isinstance(batch, ExceptionWrapper):
→ 221 raise batch.exc_type(batch.exc_msg)
222 return batch

RuntimeError: Traceback (most recent call last):
File “/usr/local/lib/python2.7/dist-packages/torch/utils/data/”, line 40, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File “/usr/local/lib/python2.7/dist-packages/torch/utils/data/”, line 106, in default_collate
return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
File “/usr/local/lib/python2.7/dist-packages/torch/utils/data/”, line 106, in
return {key: default_collate([d[key] for d in batch]) for key in batch[0]}
File “/usr/local/lib/python2.7/dist-packages/torch/utils/data/”, line 91, in default_collate
return torch.stack(batch, 0, out=out)
File “/usr/local/lib/python2.7/dist-packages/torch/”, line 66, in stack
return, dim, out=out)
RuntimeError: inconsistent tensor sizes at /pytorch/torch/lib/TH/generic/THTensorMath.c:2709

Any idea? Thanks!


you need to write a custom collate_fn and pass it to your data loader. Questions about Dataloader and Dataset


By default, torch stacks the input image to from a tensor of size N*C*H*W, so every image in the batch must have the same height and width. In order to load a batch with variable size input image, we have to use our own collate_fn which is used to pack a batch of images.

For image classification, the input to collate_fn is a list of with size batch_size. Each element is a tuple where the first element is the input image(a torch.FloatTensor) and the second element is the image label which is simply an int. Because the samples in a batch have different size, we can store these samples in a list ans store the corresponding labels in torch.LongTensor. Then we put the image list and the label tensor into a list and return the result.

here is a very simple snippet to demonstrate how to write a custom collate_fn:

import torch
from import DataLoader
from torchvision import transforms
import torchvision.datasets as datasets
import matplotlib.pyplot as plt

# a simple custom collate function, just to show the idea
def my_collate(batch):
    data = [item[0] for item in batch]
    target = [item[1] for item in batch]
    target = torch.LongTensor(target)
    return [data, target]

def show_image_batch(img_list, title=None):
    num = len(img_list)
    fig = plt.figure()
    for i in range(num):
        ax = fig.add_subplot(1, num, i+1)

#  do not do randomCrop to show that the custom collate_fn can handle images of different size
train_transforms = transforms.Compose([transforms.Scale(size = 224),

# change root to valid dir in your system, see ImageFolder documentation for more info
train_dataset = datasets.ImageFolder(root="/hd1/jdhao/toyset",

trainset = DataLoader(dataset=train_dataset,
                      collate_fn=my_collate, # use custom collate function here

trainiter = iter(trainset)
imgs, labels =

# print(type(imgs), type(labels))
show_image_batch(imgs, title=[train_dataset.classes[x] for x in labels])

Currently I am using AdaptiveAvgPool2d to downsample variable sized images. Your strategy of using a custom collate function to build a dataloader for variable sized input is very useful, but only gets me half-way there. I want to improve GPU performance by feeding the data into AdaptiveAvgPool2d by batch, so that the data is sent to the data by batch, instead of by sample. Any suggestions?


I do not think that is possible, for a batch of images which have variable size.The output size will only be same after the AdaptiveAvgPool2d layer. You can not stack image before that layer.

Also, if you want to stack image after AdaptiveAvgPool2d, you may have to separate your network into two part, which also makes the network more complex.

I suggest that you calculate the loss for each image and accumulate the loss, after you do N forward, do one backward to update the network parameter. You can see some discussions here and here.

Another way is to just pad your images to have the same size and keep the image aspect ratio. I think it is also a viable solution if you do not want to change the image aspect ratio.


How did you structure your neural network to have varying input sizes?

Hi guys!

Please consider this idea I just came up with.

The ideas is:

  • have multiple heads in your CNN and some if-logic in forward statement
  • have a cluster_index for each different type of image
  • feed only consistent batches, but randomly (batch 1 - size 1, batch 2 - size 2, batch 3 - size 1, etc)
1 Like

I have the data folder in this 4 subfolder (cat1,cat2,cat3,cat4) I want to read data folder according to the cat (subfolder ) wise not according to data(root) folder wise. or read data according to data(root) folder but have separate out data folder according to the cat(subfolder) wise.
How design this type customeDataloader.

hi did you solve this problem eventually? thanks

Hi Guys,

For the case where you want to work with batches, and the image/label dimension in different batches might be different, you want to modify the collate_fn as pointed out by smth. The correct way is to construct a new collate fn.
Lets say that label for image 1 has dimension 10, whereas label for image 2 has dimension 15, and you are trying to create a batch of these images.
The main problem with unequal sizes is that the current collate_fn implementation uses This creates an iteratable with the size of the smaller list, thus it will only create an iteratable with 10 elements in this case.
The solution is to use itertools.zip_longest. This will create an iteratable of dimension of the largest in the batch, i.e 15 in our cases. This function uses padding, so the values will be None for label 1 in the rest of the dimensions, which you will have to take care during run time in your code after you have got the batch.
Hope this helps! Let me know if anything else is needed


Can I use torch.nn.DataParallel wrapped around my model after making a collate function to tackle variable sized inputs??
If yes how? Because when using a custom collate function like this I am unable to run my model on multiple GPUs


There seems to be a large collection of posts all over pytorch that makes it difficult to solve this issue. I have collected a list of all of them hopefully making things easier for all of us. Here:


Also, Stack-overflow has a version of this question too:



My version of how to solve the problem is as follows:

def collate_fn_padd(batch):
    Padds batch of variable length

    note: it converts things ToTensor manually here since the ToTensor transform
    assume it takes in images rather than arbitrary tensors.
    ## get sequence lengths
    lengths = torch.tensor([ t.shape[0] for t in batch ]).to(device)
    ## padd
    batch = [ torch.Tensor(t).to(device) for t in batch ]
    batch = torch.nn.utils.rnn.pad_sequence(batch)
    ## compute mask
    mask = (batch != 0).to(device)
    return batch, lengths, mask

How can I use this method when my original and target are images, like pix2pix or semantic segmentation tasks? I asked this question someone referred to this post for a detailed explanation.

In your case, I think you can modify collate function to make that return list of tensor rather than stack of tensor. The mmdetection is a complicated but good example.

Just wondering, since batch will results in a list instead, where to put the ‘.to(device)’ when running on the GPU. apparently now we cannot
for x, y in batch:,

You have to convert them into tensor type.

It’s helpful, what’s parameter batch from in your function?

You can just implement a custom batch sampler which ensures that only batches with same input size are created. Look here