Torchvision and dataloader different images shapes

There are some datasets, specially those for object detection, where the images have different shapes. Is there any way to use torch.utils.data.DataLoader in order to return a batch of images with different shapes, for instance a list of torch.tensors.

Thanks.

1 Like

supply a custom collate_fn to the dataloder

following @SimonW 's answer.
Here is an example of collate_fn:
In this example mask are the mask of the images (that have different sizes). While, data are some cropped windows from the images (the crops have the same size). The idea is to use a list to wrap the data with different sizes.

In your case, the images have different sizes, so you can use the same thing done here for the masks. Need to convert what is needed for the forward to tensors.

def default_collate(batch):
    """
    Override `default_collate` https://pytorch.org/docs/stable/_modules/torch/utils/data/dataloader.html#DataLoader

    Reference:
    def default_collate(batch) at https://pytorch.org/docs/stable/_modules/torch/utils/data/dataloader.html#DataLoader
    https://discuss.pytorch.org/t/how-to-create-a-dataloader-with-variable-size-input/8278/3
    https://github.com/pytorch/pytorch/issues/1512

    We need our own collate function that wraps things up (imge, mask, label).

    In this setup,  batch is a list of tuples (the result of calling: img, mask, label = Dataset[i].
    The output of this function is four elements:
        . data: a pytorch tensor of size (batch_size, c, h, w) of float32 . Each sample is a tensor of shape (c, h_,
        w_) that represents a cropped patch from an image (or the entire image) where: c is the depth of the patches (
        since they are RGB, so c=3),  h is the height of the patch, and w_ is the its width.
        . mask: a list of pytorch tensors of size (batch_size, 1, h, w) full of 1 and 0. The mask of the ENTIRE image (no
        cropping is performed). Images does not have the same size, and the same thing goes for the masks. Therefore,
        we can't put the masks in one tensor.
        . target: a vector (pytorch tensor) of length batch_size of type torch.LongTensor containing the image-level
        labels.
    :param batch: list of tuples (img, mask, label)
    :return: 3 elements: tensor data, list of tensors of masks, tensor of labels.
    """
    data = torch.stack([item[0] for item in batch])
    mask = [item[1] for item in batch]  # each element is of size (1, h*, w*). where (h*, w*) changes from mask to another.
    target = torch.LongTensor([item[2] for item in batch])  # image labels.

    return data, mask, target

Instance: dataloader = DataLoader( .... collate_fn=default_collate, ...)

Loop over the data loader:

for img, mask, label in train_loader:
   # do stuff.

In the forward function of your model, you need to treat your input as a list.

class myModel(nn.Module):
    def forward(self, input):
        """ 
         Classify a list of samples.
        :apram input: is a list of n tensors with different height and width ...
        :return scores: tensor of scores of shape (n, #nbr_classes)
         
        """
        scores = []
        for i, x in enumerate(input):
            # x is an image.
            score = # forward x
            if i == 0:
                score = score
            else:
                scores = torch.cat((scores, score), dim=0)
      return scores
2 Likes

thank you, nice example.

Hi, sbelharbi, can you tell me more about how to modify the forward function here to " to treat your input as a list"?
As I found the input of CNN can only be Tensor:

TypeError: conv1d(): argument ‘input’ (position 1) must be Tensor, not list

hi,
yes, cnn, and pytorch modules, generally operate on tensors not lists.
the example above of the model is meant for the evaluation primarily. although, it could work for training but there may be a gradient issue.

so, in order to process a list of tensors of different shapes without changing your model, you can do the following:

  1. the list needs to contain tensors. do not feed the list to the model. the collate function above allows to build such list.
  2. you need to loop over the list OUTSIDE the model. for each loop, you will get a tensor from the list. feed that tensor to your model.
  3. you need to accumulate the gradient (for training). please see here two ways for gradient accumulation. each tensor in the list could be one single sample or a tiny mini-batch. make sure to divide the loss by the true total number of samples after forwarding all the tensors.
    thanks
1 Like