Dataloader with multipile image sizes - using collate_fn correctly

I have a dataset with different image sizes, and I see the way to go is with custom collate_fn.
I’ve seen some good answers here, but it looks like I still need to fix my collate_fn.
I searched for the max height and width in the same batch, and add zeros to (right&top) when the size is smaller.
I get the correct sizes with my collate_fn, but there’s a problem and I get the following error:

inputs, labels = data

ValueError: too many values to unpack (expected 2)

A simple code is attached, so it’s very easy to reproduce (Still looks odd here…)

import os
import numpy as np
import torch
import timm

import torch.nn as nn
import torch.nn.functional as F
f>rom torch.optim import SGD,Adam,lr_scheduler

from torchvision import transforms,models
from torchvision.datasets import ImageFolder

def my_collate(batch):

max_h = 0
max_w = 0
for x, y in batch:
print('y: ',y )
h,w = x.shape[1], x.shape[2]
if h>max_h:
if w>max_w:
max_w = w

   new_batch = []
   for x, y in batch:
    if (x.shape[1] != max_h) or (x.shape[2] != max_w):
        print('Fix (like padding?)')
        h,w = x.shape[1], x.shape[2]
        pad_w = torch.zeros(3, h, max_w-w)
        new_x =,pad_w),2) # adjust width
        new_h, new_w = new_x.shape[1], new_x.shape[2]
        pad_h = torch.zeros(3, max_h-h, new_w) # adjust height
        new_x =,pad_h),1)
        new_batch.append((x,y)) # max dimensions -- > just add to the new batch
return new_batch

train_transform = transforms.Compose([
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

dataset_path = ‘/home/diffrent_img_sizes_dataset’
train_dataset = ImageFolder(root=os.path.join(dataset_path , ‘train’), transform=train_transform)
train_loader =, batch_size=8, collate_fn=my_collate, num_workers=2, drop_last=True, shuffle=True)

model = timm.models.mobilenetv3_large_100(pretrained=True)
criterion = nn.CrossEntropyLoss()
optimizer = SGD(model.parameters(), lr=0.005, momentum=0.9)
lr_schedule = lr_scheduler.StepLR(optimizer, step_size=1, gamma=0.9)

for epoch in range(1):
for i, data in enumerate(train_loader, 0):
inputs, labels = data
if torch.cuda.is_available():
inputs, labels = inputs.cuda(),labels.cuda()
outputs = model(inputs)
loss = criterion(outputs, labels)
out = torch.argmax(outputs.detach(),dim=1)

In the collate_fn, if I create x_new_list and y_list and use

return x_new_list, y_list

Then I still get an error (a different one)

How do I fix my collate_fn?
Is there a common way to tackle the different sizes rather than adding zeros this way?

I succeeded to make it work! Now the collate function returns [ [N,C,H,W], [N]]- first is the images and second the labels. Not sure though it’s the best way to tackle that