RuntimeError: stack expects each tensor to be equal size, but got [3, 224, 224] at entry 0 and [3, 224, 336] at entry 3

Thank you so much! I was stuck at this error for so long.

Hi everyone, I’m new to pytorch and I have the same problem because I have rectangular images of different sizes.
So if I understood well the DataLoader does not natively support images of different shapes, is that correct?
If so, is there a workaround to handle such cases?

Thanks in advance

Yes, the default collate_fn used in the DataLoader tries to torch.stack the inputs and will fail, if the samples have a different shape. A fix would be to write a custom collate_fn and return the samples in e.g. a list. Note that while this would fix the creation of the batch in the DataLoader, your model would most likely not be able to use the list as an input and you would then need to pass each sample separately, resize it etc.

2 Likes

Thanks for this precise response. This worked

I’ve got the same error. I solved by doing the following changes in my script:

  1. Use a tuple in the Resize as follow:

    transforms.Resize((img_size, img_size))
    
  2. Defined a new collate_fn function (by following the solution of @ptrblck )

    def collate_fn(batch):
       batch = list(filter(lambda x: x is not None, batch))
       return torch.utils.data.dataloader.default_collate(batch)      
    
  3. And add it to my DataLoader

		train_loader = torch.utils.data.DataLoader(dataset=train_transform_data,batch_size=batch_size,shuffle=True, collate_fn=collate_fn)
		valid_loader = torch.utils.data.DataLoader(dataset=valid_transform_data,batch_size=batch_size,shuffle=True, collate_fn=collate_fn)    
     

Thank you

Can we try using batch_size = 1 ? By using batch_size=1, PyTorch dataloader does’nt stack multiple samples and hence no need to collate. And it works. I was able to remove the error by using batch size as 1. My doubt is what will happen if we update the weights after every 16 samples of batch_size=1? Like for dataloader, the batch_size is 1 but the weights are updated and optimized after every 16 samples. So my effective batch size in reality will be 16 only. Any consequences of using this approach? Can i add losses for each sample and then backpropagate them once 16 samples are evaluated?

Yes, this approach is known as gradient accumulation and could work.
The issue I’m seeing is that batchsize-dependent layers, such as batchnorm layers, might update the running stats with noisy estimates coming from a single sample (or they might even crash if not enough values are available to calculate the stats).
Besides that, gradient accumulation is a valid approach and should yield the same results (again assuming that all layers have the same “behavior”).

1 Like

Thanks @ptrblck, I tried all the possible ways to solve this issue but didn’t get success. It gets solved with your advice.

1 Like

Hello, I tried to Resize the image. I did Resize((size, size)) but it did not work for me.

transforms.Compose([transforms.ToPILImage(),transforms.Resize((224, 224)),
                                     transforms.ToTensor(),
                                     transforms.Normalize(mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225])])

Is get the same issue
stack expects each tensor to be equal size, but got [60, 80, 3] at entry 0 and [107, 80, 3] at entry 1

Based on your error message it seems you are not applying the transformation, so try to narrow down where the error is raised and why the transformation wasn’t used.

Thank you very much. It worked. Turns out I had an error in the Dataset Splitting class and transformation was not applied.

max_epochs = 600
val_interval = 2
best_metric = -1
best_metric_epoch = -1
epoch_loss_values = []
metric_values = []
post_pred = Compose([AsDiscrete(argmax=True, to_onehot=3)])
post_label = Compose([AsDiscrete(to_onehot=2)])

for epoch in range(max_epochs):
print(“-” * 10)
print(f"epoch {epoch + 1}/{max_epochs}“)
model.train()
epoch_loss = 0
step = 0
for batch_data in train_loader:
step += 1
inputs, labels = (
batch_data[“image”].to(device),
batch_data[“label”].to(device),
)
optimizer.zero_grad()
outputs = model(inputs)
loss = loss_function(outputs, labels)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
print(
f”{step}/{len(train_ds) // train_loader.batch_size}, "
f"train_loss: {loss.item():.4f}“)
epoch_loss /= step
epoch_loss_values.append(epoch_loss)
print(f"epoch {epoch + 1} average loss: {epoch_loss:.4f}”)

if (epoch + 1) % val_interval == 0:
    model.eval()
    with torch.no_grad():
        for val_data in val_loader:
            val_inputs, val_labels = (
                val_data["image"].to(device),
                val_data["label"].to(device),
            )
            roi_size = (160, 160, 160)
            sw_batch_size = 4
            val_outputs = sliding_window_inference(
                val_inputs, roi_size, sw_batch_size, model)
            val_outputs = [post_pred(i) for i in decollate_batch(val_outputs)]
            val_labels = [post_label(i) for i in decollate_batch(val_labels)]
            # compute metric for current iteration
            dice_metric(y_pred=val_outputs, y=val_labels)

        # aggregate the final mean dice result
        metric = dice_metric.aggregate().item()
        # reset the status for next validation round
        dice_metric.reset()

        metric_values.append(metric)
        if metric > best_metric:
            best_metric = metric
            best_metric_epoch = epoch + 1
            torch.save(model.state_dict(), os.path.join(
                root_dir, "best_metric_model.pth"))
            print("saved new best metric model")
        print(
            f"current epoch: {epoch + 1} current mean dice: {metric:.4f}"
            f"\nbest mean dice: {best_metric:.4f} "
            f"at epoch: {best_metric_epoch}"
        )

Hey. I have tried to solved my problem using your solution. Now it returns the following error. Can you help?
TypeError: list indices must be integers or slices, not str

I got the same error, but in my case it’s important to keep the original image size. I mean I want to build NN that get and return different images with different size.
Where I can find an example for such NN?

I have followed everything said under this topic but i am still getting this error RuntimeError: stack expects each tensor to be equal size, but got [3, 736, 353] at entry 0 and [3, 729, 350] at entry 1

here is my code snippet
def getitem(self, index):

    image_path = self.im_files[index].rstrip()
    im_name_splits = image_path.split(os.sep)[-1].split('.')[0].split('_')

    img = Image.open(image_path)
 
    resize = F.Resize((385,185), interpolation=InterpolationMode.BILINEAR)
    crop = F.CenterCrop((385, 185))
    img = crop(resize(img))
    img = np.array(img, dtype=np.uint8)

    lable_path = os.path.join(self.labels_base, im_name_splits[0] + '_road_' + im_name_splits[1] + '.png')

    label = Image.open(lable_path)
    
    label = crop(resize(label))
    label = np.array(label, dtype=np.uint8)
    if label is None:
        raise ValueError(f"Invalid label image: {self.label_paths[image_path]}")

    # Convert NumPy arrays to PIL images
    img = Image.fromarray(img)
    label = Image.fromarray(label)

    img = img.convert("RGB")  
    label = label.convert("RGB")  
    
    # Get the dimensions of the label image using the .size attribute
    label_size = label.size
    print(f"label size: {label_size}")

    
    #convert to binary mask
    road_label = np.array([255, 0, 255])

    # reshape the road_label array to match the label array
    road_label = np.reshape(road_label, (1, 1, 3))
    # cast both arrays to the same dtype, such as uint8

    road_label = road_label.astype(np.uint8)

    # Check if label is a grayscale image
    if label.mode == 'L':
      
        label = label  
    else:
        # It's a color image with color channels, apply np.all along axis=2
        label = np.array(label)  
        cond = np.all(label == road_label, axis=2)
        label = label * cond[..., np.newaxis]

    # convert the label to grayscale
    label = np.dot(label[..., :3], [0.2989, 0.5870, 0.1140])
    label = np.expand_dims(label, axis=-1) 

    # Convert the PIL image to a NumPy array
    img = np.array(img)
    img = normalize(img, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    img = img.transpose((2, 0, 1))  
    label = label.transpose((2, 0, 1)) 

    img = torch.from_numpy(img)
    gt_image = torch.from_numpy(label)

    sample = {'image': img, 'label': gt_image}

    if self.transform:
        sample = resize(sample)
        sample = crop(sample)
    if self.augmentations:
        sample = self.augmentations(sample)

    return sample

def __len__(self):
    #print(f"The number of labels is : {len(self.im_files)}")
    return len(self.im_files)  

class KittiDatasetTest(Dataset):
def init(self, rootdir, transform=None):
self.transform = transform
self.rootdir = rootdir
self.images_base = os.path.join(self.root, ‘testing’, ‘image_2’)
self.im_files = recursive_glob(rootdir=self.images_base, suffix=‘.png’)
self.im_files = sorted(self.im_files)

def __getitem__(self, index):
    image_path = self.im_files[index].rstrip()
    
    img = Image.open(image_path)
    resize = F.Resize((385,185), interpolation=InterpolationMode.BILINEAR)
    crop = F.CenterCrop((385,185))
    img = crop(resize(img))
   
    img = np.array(img, dtype=np.uint8)

    img = normalize(np.array(img), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    img = img.transpose((2, 0, 1))

    img = torch.from_numpy(img)
    sample = {'image': img}
    if self.transform:
        sample = resize(sample)
        sample = crop(sample)

    return sample

def __len__(self):
    for path, dirs, files in os.walk(self.im_files):
        n = len(files)
        return n
        break

if tensor_1 is [3,736,353] change the tensor [3,729,350] to be in the format [3,353,350].looks like you’re working with images. you have to resize the images to that format. That should work,

I got this error when there getitems, i checked it and found that one of the loss label was “tensor(,dtype=torch.float64)” ,that really beats me! :melting_face: