RuntimeError: stack expects each tensor to be equal size, but got [3, 224, 224] at entry 0 and [3, 224, 336] at entry 3

Ashima_Garg · April 25, 2021, 3:23am

Thank you so much! I was stuck at this error for so long.

clissa · May 5, 2021, 10:10am

Hi everyone, I’m new to pytorch and I have the same problem because I have rectangular images of different sizes.
So if I understood well the DataLoader does not natively support images of different shapes, is that correct?
If so, is there a workaround to handle such cases?

Thanks in advance

ptrblck · May 5, 2021, 7:50pm

Yes, the default collate_fn used in the DataLoader tries to torch.stack the inputs and will fail, if the samples have a different shape. A fix would be to write a custom collate_fn and return the samples in e.g. a list. Note that while this would fix the creation of the batch in the DataLoader, your model would most likely not be able to use the list as an input and you would then need to pass each sample separately, resize it etc.

truesrini · September 11, 2021, 3:49am

Thanks for this precise response. This worked

Sehaba95 · April 6, 2022, 10:14am

I’ve got the same error. I solved by doing the following changes in my script:

Use a tuple in the Resize as follow:

transforms.Resize((img_size, img_size))

Defined a new collate_fn function (by following the solution of @ptrblck )

def collate_fn(batch):
   batch = list(filter(lambda x: x is not None, batch))
   return torch.utils.data.dataloader.default_collate(batch)

And add it to my DataLoader

		train_loader = torch.utils.data.DataLoader(dataset=train_transform_data,batch_size=batch_size,shuffle=True, collate_fn=collate_fn)
		valid_loader = torch.utils.data.DataLoader(dataset=valid_transform_data,batch_size=batch_size,shuffle=True, collate_fn=collate_fn)

Thank you

YadneshD · August 25, 2022, 6:40pm

Can we try using batch_size = 1 ? By using batch_size=1, PyTorch dataloader does’nt stack multiple samples and hence no need to collate. And it works. I was able to remove the error by using batch size as 1. My doubt is what will happen if we update the weights after every 16 samples of batch_size=1? Like for dataloader, the batch_size is 1 but the weights are updated and optimized after every 16 samples. So my effective batch size in reality will be 16 only. Any consequences of using this approach? Can i add losses for each sample and then backpropagate them once 16 samples are evaluated?

ptrblck · August 25, 2022, 9:58pm

Yes, this approach is known as gradient accumulation and could work.
The issue I’m seeing is that batchsize-dependent layers, such as batchnorm layers, might update the running stats with noisy estimates coming from a single sample (or they might even crash if not enough values are available to calculate the stats).
Besides that, gradient accumulation is a valid approach and should yield the same results (again assuming that all layers have the same “behavior”).

javedsidq · November 18, 2022, 4:13pm

Thanks @ptrblck, I tried all the possible ways to solve this issue but didn’t get success. It gets solved with your advice.

azamat.ilyasov.02 · December 5, 2022, 5:33pm

Hello, I tried to Resize the image. I did Resize((size, size)) but it did not work for me.

transforms.Compose([transforms.ToPILImage(),transforms.Resize((224, 224)),
                                     transforms.ToTensor(),
                                     transforms.Normalize(mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225])])

Is get the same issue
stack expects each tensor to be equal size, but got [60, 80, 3] at entry 0 and [107, 80, 3] at entry 1

ptrblck · December 5, 2022, 9:18pm

Based on your error message it seems you are not applying the transformation, so try to narrow down where the error is raised and why the transformation wasn’t used.

azamat.ilyasov.02 · December 7, 2022, 9:30am

Thank you very much. It worked. Turns out I had an error in the Dataset Splitting class and transformation was not applied.

Sonia_Khan · December 9, 2022, 4:27pm

max_epochs = 600
val_interval = 2
best_metric = -1
best_metric_epoch = -1
epoch_loss_values = []
metric_values = []
post_pred = Compose([AsDiscrete(argmax=True, to_onehot=3)])
post_label = Compose([AsDiscrete(to_onehot=2)])

for epoch in range(max_epochs):
print(“-” * 10)
print(f"epoch {epoch + 1}/{max_epochs}“)
model.train()
epoch_loss = 0
step = 0
for batch_data in train_loader:
step += 1
inputs, labels = (
batch_data[“image”].to(device),
batch_data[“label”].to(device),
)
optimizer.zero_grad()
outputs = model(inputs)
loss = loss_function(outputs, labels)
loss.backward()
optimizer.step()
epoch_loss += loss.item()
print(
f”{step}/{len(train_ds) // train_loader.batch_size}, "
f"train_loss: {loss.item():.4f}“)
epoch_loss /= step
epoch_loss_values.append(epoch_loss)
print(f"epoch {epoch + 1} average loss: {epoch_loss:.4f}”)

if (epoch + 1) % val_interval == 0:
    model.eval()
    with torch.no_grad():
        for val_data in val_loader:
            val_inputs, val_labels = (
                val_data["image"].to(device),
                val_data["label"].to(device),
            )
            roi_size = (160, 160, 160)
            sw_batch_size = 4
            val_outputs = sliding_window_inference(
                val_inputs, roi_size, sw_batch_size, model)
            val_outputs = [post_pred(i) for i in decollate_batch(val_outputs)]
            val_labels = [post_label(i) for i in decollate_batch(val_labels)]
            # compute metric for current iteration
            dice_metric(y_pred=val_outputs, y=val_labels)

        # aggregate the final mean dice result
        metric = dice_metric.aggregate().item()
        # reset the status for next validation round
        dice_metric.reset()

        metric_values.append(metric)
        if metric > best_metric:
            best_metric = metric
            best_metric_epoch = epoch + 1
            torch.save(model.state_dict(), os.path.join(
                root_dir, "best_metric_model.pth"))
            print("saved new best metric model")
        print(
            f"current epoch: {epoch + 1} current mean dice: {metric:.4f}"
            f"\nbest mean dice: {best_metric:.4f} "
            f"at epoch: {best_metric_epoch}"
        )

Hey. I have tried to solved my problem using your solution. Now it returns the following error. Can you help?
TypeError: list indices must be integers or slices, not str

chaospredictor · January 8, 2023, 1:36pm

I got the same error, but in my case it’s important to keep the original image size. I mean I want to build NN that get and return different images with different size.
Where I can find an example for such NN?

Andy_Luben · October 17, 2023, 3:05pm

I have followed everything said under this topic but i am still getting this error RuntimeError: stack expects each tensor to be equal size, but got [3, 736, 353] at entry 0 and [3, 729, 350] at entry 1

here is my code snippet
def getitem(self, index):

    image_path = self.im_files[index].rstrip()
    im_name_splits = image_path.split(os.sep)[-1].split('.')[0].split('_')

    img = Image.open(image_path)
 
    resize = F.Resize((385,185), interpolation=InterpolationMode.BILINEAR)
    crop = F.CenterCrop((385, 185))
    img = crop(resize(img))
    img = np.array(img, dtype=np.uint8)

    lable_path = os.path.join(self.labels_base, im_name_splits[0] + '_road_' + im_name_splits[1] + '.png')

    label = Image.open(lable_path)
    
    label = crop(resize(label))
    label = np.array(label, dtype=np.uint8)
    if label is None:
        raise ValueError(f"Invalid label image: {self.label_paths[image_path]}")

    # Convert NumPy arrays to PIL images
    img = Image.fromarray(img)
    label = Image.fromarray(label)

    img = img.convert("RGB")  
    label = label.convert("RGB")  
    
    # Get the dimensions of the label image using the .size attribute
    label_size = label.size
    print(f"label size: {label_size}")

    
    #convert to binary mask
    road_label = np.array([255, 0, 255])

    # reshape the road_label array to match the label array
    road_label = np.reshape(road_label, (1, 1, 3))
    # cast both arrays to the same dtype, such as uint8

    road_label = road_label.astype(np.uint8)

    # Check if label is a grayscale image
    if label.mode == 'L':
      
        label = label  
    else:
        # It's a color image with color channels, apply np.all along axis=2
        label = np.array(label)  
        cond = np.all(label == road_label, axis=2)
        label = label * cond[..., np.newaxis]

    # convert the label to grayscale
    label = np.dot(label[..., :3], [0.2989, 0.5870, 0.1140])
    label = np.expand_dims(label, axis=-1) 

    # Convert the PIL image to a NumPy array
    img = np.array(img)
    img = normalize(img, mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    img = img.transpose((2, 0, 1))  
    label = label.transpose((2, 0, 1)) 

    img = torch.from_numpy(img)
    gt_image = torch.from_numpy(label)

    sample = {'image': img, 'label': gt_image}

    if self.transform:
        sample = resize(sample)
        sample = crop(sample)
    if self.augmentations:
        sample = self.augmentations(sample)

    return sample

def __len__(self):
    #print(f"The number of labels is : {len(self.im_files)}")
    return len(self.im_files)

class KittiDatasetTest(Dataset):
def init(self, rootdir, transform=None):
self.transform = transform
self.rootdir = rootdir
self.images_base = os.path.join(self.root, ‘testing’, ‘image_2’)
self.im_files = recursive_glob(rootdir=self.images_base, suffix=‘.png’)
self.im_files = sorted(self.im_files)

def __getitem__(self, index):
    image_path = self.im_files[index].rstrip()
    
    img = Image.open(image_path)
    resize = F.Resize((385,185), interpolation=InterpolationMode.BILINEAR)
    crop = F.CenterCrop((385,185))
    img = crop(resize(img))
   
    img = np.array(img, dtype=np.uint8)

    img = normalize(np.array(img), mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    img = img.transpose((2, 0, 1))

    img = torch.from_numpy(img)
    sample = {'image': img}
    if self.transform:
        sample = resize(sample)
        sample = crop(sample)

    return sample

def __len__(self):
    for path, dirs, files in os.walk(self.im_files):
        n = len(files)
        return n
        break

Joe_Ngugi · October 17, 2023, 4:38pm

if tensor_1 is [3,736,353] change the tensor [3,729,350] to be in the format [3,353,350].looks like you’re working with images. you have to resize the images to that format. That should work,

Apollo23023 · October 20, 2023, 1:24am

I got this error when there getitems, i checked it and found that one of the loss label was “tensor(,dtype=torch.float64)” ,that really beats me!