Torchvision and dataloader different images shapes

There are some datasets, specially those for object detection, where the images have different shapes. Is there any way to use in order to return a batch of images with different shapes, for instance a list of torch.tensors.


1 Like

supply a custom collate_fn to the dataloder

following @SimonW 's answer.
Here is an example of collate_fn:
In this example mask are the mask of the images (that have different sizes). While, data are some cropped windows from the images (the crops have the same size). The idea is to use a list to wrap the data with different sizes.

In your case, the images have different sizes, so you can use the same thing done here for the masks. Need to convert what is needed for the forward to tensors.

def default_collate(batch):
    Override `default_collate`

    def default_collate(batch) at

    We need our own collate function that wraps things up (imge, mask, label).

    In this setup,  batch is a list of tuples (the result of calling: img, mask, label = Dataset[i].
    The output of this function is four elements:
        . data: a pytorch tensor of size (batch_size, c, h, w) of float32 . Each sample is a tensor of shape (c, h_,
        w_) that represents a cropped patch from an image (or the entire image) where: c is the depth of the patches (
        since they are RGB, so c=3),  h is the height of the patch, and w_ is the its width.
        . mask: a list of pytorch tensors of size (batch_size, 1, h, w) full of 1 and 0. The mask of the ENTIRE image (no
        cropping is performed). Images does not have the same size, and the same thing goes for the masks. Therefore,
        we can't put the masks in one tensor.
        . target: a vector (pytorch tensor) of length batch_size of type torch.LongTensor containing the image-level
    :param batch: list of tuples (img, mask, label)
    :return: 3 elements: tensor data, list of tensors of masks, tensor of labels.
    data = torch.stack([item[0] for item in batch])
    mask = [item[1] for item in batch]  # each element is of size (1, h*, w*). where (h*, w*) changes from mask to another.
    target = torch.LongTensor([item[2] for item in batch])  # image labels.

    return data, mask, target

Instance: dataloader = DataLoader( .... collate_fn=default_collate, ...)

Loop over the data loader:

for img, mask, label in train_loader:
   # do stuff.

In the forward function of your model, you need to treat your input as a list.

class myModel(nn.Module):
    def forward(self, input):
         Classify a list of samples.
        :apram input: is a list of n tensors with different height and width ...
        :return scores: tensor of scores of shape (n, #nbr_classes)
        scores = []
        for i, x in enumerate(input):
            # x is an image.
            score = # forward x
            if i == 0:
                score = score
                scores =, score), dim=0)
      return scores

thank you, nice example.

Hi, sbelharbi, can you tell me more about how to modify the forward function here to " to treat your input as a list"?
As I found the input of CNN can only be Tensor:

TypeError: conv1d(): argument ‘input’ (position 1) must be Tensor, not list

yes, cnn, and pytorch modules, generally operate on tensors not lists.
the example above of the model is meant for the evaluation primarily. although, it could work for training but there may be a gradient issue.

so, in order to process a list of tensors of different shapes without changing your model, you can do the following:

  1. the list needs to contain tensors. do not feed the list to the model. the collate function above allows to build such list.
  2. you need to loop over the list OUTSIDE the model. for each loop, you will get a tensor from the list. feed that tensor to your model.
  3. you need to accumulate the gradient (for training). please see here two ways for gradient accumulation. each tensor in the list could be one single sample or a tiny mini-batch. make sure to divide the loss by the true total number of samples after forwarding all the tensors.
1 Like

Hi @sbelharbi thank you for your points and suggestions, this is over a year old, i hope you see and respond to this. When I implement the 3-steps you highlighted, I notice my training process becomes extremely slow. I cant seem to understand why. Can you check my implementation.


hi @onyekaokonji,

  1. code: – unrelated to slowness.
    i cant tell from your code. also, your dataset class is not clear. but i assume images is a list of tensors.
    one main issue: you are zeroing out the gradient every call over a single sample.
    you shouldnt do that because you are losing the gradient you just captured. i assume you are trying to accumulate gradient over all samples of a minibatch.
    you should zero it out right after the first loop before entering inner loop; so to accumulate it gradient when looping over samples (inner loop).

    also, the inner loop, you can skip the if test; unless there is a reason to leave it.
    you can call step right after you leave inner loop. you update weights after processing all samples of a minibatch.

    num_accumulation_steps should be computed as the number of samples per minibatch. it could change towsrd last samples. (= len(imgs)).

  2. why it is slow?
    it is expected to be slower than standard minibatch because now you are processing each sample alone instead of processing all the minibatch at once. here, processing time could be close to x32 times processing a single sample. you are losing most of gpu benefit (ability to process several samples at once).

    my suggestion is to resize images to same size to be able to hold them in single tensor and process them all at once. unless, it is absolutely necessary to maintain image sizes.

    avoid using .item() unless necessary. it requires gpu-to-cpu transfer (time consuming). you can do all operations over gpu tensors, unless you need cpu values.


Thanks for your response @sbelharbi if i get your point about the zero_grad() correctly, it should be placed in the outer loop before the inner one right? Pls see screenshot attached.

For num_accumulation_steps, I actually want it to be 32, which is why I fixed as so.

And yeah! unfortunately, the images I’m working with need the aspect ratio maintained and unfortunately, their aspect ratio isn’t 1:1 and they aren’t all of equal aspect ratios either, so I’m forced to feed the images singly.

yeah, now it seems fine.
for num_accumulation_steps, it could be different than 32 for the last samples if your dataset size is not divisible by 32 or any batch size. to be always sure that you are averaging by the right scalar, you can divide by the number of samples in current minibatch.

to speedup, you can use multi-gpus to process a sample per-gpu in parallel (Getting Started with Distributed Data Parallel — PyTorch Tutorials 1.12.1+cu102 documentation), and deal with gradient accumulation. but you may need several gpus to see significant speedup.

you can crop random patches with same size for training. you may not need to train on full images.
you can keep same size for evaluation, and evaluate sample per sample.


Thanks @sbelharbi I will look into your suggestions.