How to create a dataloader with variable-size images

Is it possible to have a batch-wise dataloader with different sized images for inference?
My current implemented c++ custom dataloader works fine if all my images are resized to the same size. The dataloader is inspired by
If however the sizes are not equal, the dataloader fails during the stacking process (torch::stack(); stacking of all images of a batch into one tensor), which makes totally sense.

My intention is to use an unoptimized model (no TensorRT) in order to apply the model to different sized images. I do not want the resized the images to the same size, as the image sizes range from approx. 100x100 up to 3000x3000, and I would therefore loss too much information when resizing.

Hi, I’ve encountered the same problem before and have come up with a solution:

  1. Log the height and width of each image in the DataLoader.
  2. Construct a data loader wrapper that pads all the images to the maximum length and width of the entire batch.
  3. If padding with ‘0’ affects the evaluation, you should crop each image according to the height/width returned by the data loader wrapper before passing it into the network.

In my experience, there’s a neat trick you can use. You should rearrange the data index for your evaluation data loader. For instance, you can sort the entire evaluation data by computation burden (using, for example, the metric “area”, i.e., height x width). In doing so, for each batch during evaluation, the sizes will be roughly similar, significantly reducing the size gaps among the inference batches.

Here is a snippet of my code:

        if self.extra_dataset == False and self.no_sort == False:
            self.imgs_new = []
            self.cal_load_list = torch.zeros(self.validate_num)
            print('Pre-reading the resized image size info, and sort... please wait for round 1 min')

            for i in range(self.validate_num):
                img_path, _ = self.imgs[i]
                img ="RGB")
                cur_dataset = super(EvalDatasetConstructor, self).get_cur_dataset(img_path)
                # do not resize, just get the resized size to acceralate
                H, W = super(EvalDatasetConstructor, self).resize(img, cur_dataset, perform_resize=False)
                cal_load =  H*W
                self.cal_load_list[i] = cal_load

            # sort the img_path in a descending order acoorindg the cal_load
            new_load_list, indices = torch.sort(self.cal_load_list, descending=True)
            for i in range(self.validate_num):
                cur_index = indices[i]
                # select img_path-den_path pair from self.imgs to form a new imgs_new list sorted by cal_load

            # finally, rename self.imgs_new to self.imgs
            self.imgs = self.imgs_new

Hi @Zhaoyi-Yan ,
thanks for the quick reply! You brought a nice solution. As padded areas will affect my results, I need to crop the images of the batch before feeding it into the model. How should I do this step? If loop through the images of each batch, crop them according to their previous size and subsequently feed one image after another to the model, don’t I lose the advantages of batch-wise inference?

my current prediction step for same-sized images is something like this:

for (auto& batch : *dataloader) {
    torch::Tensor tensor  = Model.forward(std::vector<torch::jit::IValue>{ }).toTensor();

with your idea I would do something like this, however I dont:

for (auto& batch : *dataloader) {
   for (int i = 0; i < batchsize; ++i) {
      torch::Tensor image = batch.images.index({i,                                       // image idx in batch
                                                                          torch::indexing::Slice(),     // image width
                                                                          torch::indexing::Slice()});  // image height

      torch::Tensor tensor  = Model.forward( ).toTensor();

There are two options you can consider:

  1. It the task you’re tackling involves pixel-level prediction, you can perform masking on the ROIs. This should reduce the computational burden and lead to performance results that are relatively close to those obtained with cropped images. You can then conduct the inference again using the best performance on the validation dataset but with cropped images as inputs.

  2. If you’re working on a classification-like task, it’s recommended to validate the test dataset using one image per GPU via distributed inference. This shouldn’t be too slow.

In both of these approaches, sorting the dataset in __init__ can be helpful based on my experience. This is because it can reduce the average size of the entire resized validation dataset.

1 Like

My task is semantic segmentation of defects, so pixel-level prediction. My model is already trained and validated. Training and validation images are all of the same size (cropped of large images). Only during inference the model is applied on different-sized images.
I’m not sure if masking with a ROI is sufficient, as the prediction of each pixel is also depending on the neighboring pixels. I will check which approach is more promising.
Sorting the images before splitting into batches makes totally sense.
Thanks a lot again!