Invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 4 in dimension 1

Hey,
I’m using the segmentation models pytorch repo (GitHub - qubvel/segmentation_models.pytorch: Segmentation models with pretrained backbones. PyTorch.), and I’m trying to use a 4 channel input image instead of 3 (which most of the code I’m using seems to expect). I’m getting this issue and I’m stumped on how to debug it. Anyone have any ideas? Anywhere I should go print tensor shapes?

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 188, in <module>
    main(cfg)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 176, in main
    **cfg.training.fit,
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 257, in fit
    for i, batch in enumerate(train_dataloader):
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 801, in __next__
    return self._process_data(data)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 75, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 75, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 65, in default_collate
    return default_collate([torch.as_tensor(b) for b in batch])
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 4 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

Based on the stacktrace it seems that the DataLoader is raising this error while trying to stack the image tensors to a batch, as some tensors seems to use 3 channels while others 4.
Did you make sure that each image has indeed 4 channels? You could set the batch_size to 1, iterate the DataLoader once for an entire epoch and check the number of channels of each image.

1 Like

Looking at this further, I feel like I’m getting some strange dataloader behavior. In my Dataset class in my getitem I am stacking my 4th dim with np.dstack(depth image) on my RGB image to get a shape of (H,W,4) . The depth images definitely exist, but for some reason I end up with a tensor with 3 channels instead of 4 shortly after starting training. I get no error from the dstack that adds the 4th channel, it just is somehow 3 channels.

python3.6, torch 1.2.0

    def __getitem__(self, i):
        id = self.ids[i]
        image_path = os.path.join(self.images_dir, id)
        dsm_image_path = os.path.join(self.dsm_images_dir, id)
        mask_path = os.path.join(self.masks_dir, id)
        loss_mask_path = os.path.join(self.loss_masks_dir, id) if self.loss_masks_dir else None # loss masks should have same filename as og image

        print(id)
        # concat 3 RGB and 1 Depth channel
        rgb_image = self.read_image(image_path)
        dsm_image = self.read_image(dsm_image_path)
        # resize dsm image to rgb image if sizes are different
      
        image_full = np.dstack([rgb_image, self.normalize_image(dsm_image)])
torch.Size([1, 4, 768, 768])
11946.tiff
train:   0%|                                                                               | 1/63647 [00:08<152:35:49,  8.63s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]torch.Size([1, 4, 768, 768])
26528.tiff
train:   0%|                                                                                | 2/63647 [00:08<64:36:32,  3.65s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]torch.Size([1, 4, 768, 768])
100243.tiff
train:   0%|                                                                                | 3/63647 [00:08<36:17:29,  2.05s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]torch.Size([1, 3, 768, 768])
102467.tiff
train:   0%|                                                                                | 3/63647 [00:09<53:39:15,  3.03s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 188, in <module>
    main(cfg)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 176, in main
    **cfg.training.fit,
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 266, in fit
    output = self._feed_batch(batch)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 35, in wrapped
    res = f(*args, **kwargs)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 153, in _feed_batch
    output = self.model(*input)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/base/model.py", line 15, in forward
    features = self.encoder(x)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/encoders/efficientnet.py", line 50, in forward
    x = self._swish(self._bn0(self._conv_stem(x)))
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/efficientnet_pytorch/utils.py", line 271, in forward
    x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 48 4 3 3, expected input[1, 3, 770, 770] to have 4 channels, but got 3 channels instead

Using batch size 1, num workers 1 (to try to debug this)

train:   0%|                                                                                        | 0/63647 [00:00<?, ?it/s]
torch.Size([1, 4, 768, 768])
['44873.tiff']
train:   0%|                   | 1/63647 [00:09<164:02:23,  9.28s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]
torch.Size([1, 4, 768, 768])
['31376.tiff']
train:   0%|                    | 2/63647 [00:09<69:21:28,  3.92s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]
torch.Size([1, 3, 768, 768])
['5430.tiff']
train:   0%|                    | 2/63647 [00:09<84:31:52,  4.78s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 188, in <module>
    main(cfg)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 176, in main
    **cfg.training.fit,
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 267, in fit
    output = self._feed_batch(batch)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 35, in wrapped
    res = f(*args, **kwargs)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 153, in _feed_batch
    output = self.model(*input)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/base/model.py", line 15, in forward
    features = self.encoder(x)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/encoders/efficientnet.py", line 50, in forward
    x = self._swish(self._bn0(self._conv_stem(x)))
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/efficientnet_pytorch/utils.py", line 271, in forward
    x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 48 4 3 3, expected input[1, 3, 770, 770] to have 4 channels, but got 3 channels instead

Disable the shuffling in the DataLoader (if not already done) and print the batch index to further isolate the problematic sample (also keep batch_size=1).
With this index you can directly index the Dataset and check why this sample has 3 channels only.

None of the images that I got printed are actually that bad size (768x768)

train:   0%|                                                                 | 0/63647 [00:00<?, ?it/s]
Converting from (206, 206, 3)
(206, 206, 4)
Converting from (250, 251, 3)
(250, 251, 4)
torch.Size([1, 4, 768, 768])
['78703.tiff']
batch index 0
Converting from (191, 191, 3)
(191, 191, 4)
train:   0%| | 1/63647 [00:07<135:26:18,  7.66s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.
torch.Size([1, 4, 768, 768])
['28456.tiff']
batch index 1
Converting from (228, 228, 3)
(228, 228, 4)
train:   0%| | 2/63647 [00:07<59:14:39,  3.35s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0
torch.Size([1, 3, 768, 768])
['79771.tiff']
batch index 2
train:   0%| | 2/63647 [00:08<78:48:11,  4.46s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 188, in <module>
    main(cfg)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 176, in main
    **cfg.training.fit,
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 268, in fit
    output = self._feed_batch(batch)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 35, in wrapped
    res = f(*args, **kwargs)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 153, in _feed_batch
    output = self.model(*input)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/base/model.py", line 15, in forward
    features = self.encoder(x)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/encoders/efficientnet.py", line 50, in forward
    x = self._swish(self._bn0(self._conv_stem(x)))
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/efficientnet_pytorch/utils.py", line 271, in forward
    x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 48 4 3 3, expected input[1, 3, 770, 770] to have 4 channels, but got 3 channels instead

Screenshot from 2021-06-14 11-06-01

In the output it looks like one of the images is actually 3 channels instead of 4:

train:   0%| | 2/63647 [00:07<59:14:39,  3.35s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0
torch.Size([1, 3, 768, 768])
['79771.tiff']

Indeed, but I’m not sure how or why. The code I have that adds the 4th channel does not ever throw any errors. And the batch ids I print out are all valid, not related to the offending tensor

This could be a case where the actual image file as advertised doesn’t agree with what the dataset claims; you might want to check if the input image in this case is actually a 4 channel TIFF image. For example, I believe some images claimed to be “jpegs” in ImageNet are actually PNGs!

I have a dataset of RGB tiffs (3 channel) and a dataset of depth tiffs (1 channel) I am dstack-ing them in the getitem method of my dataset class as shown here

    def __getitem__(self, i):
        id = self.ids[i]
        image_path = os.path.join(self.images_dir, id)
        dsm_image_path = os.path.join(self.dsm_images_dir, id)
        mask_path = os.path.join(self.masks_dir, id)
        loss_mask_path = os.path.join(self.loss_masks_dir, id) if self.loss_masks_dir else None # loss masks should have same filename as og image

        # print(id)
        # concat 3 RGB and 1 Depth channel
        rgb_image = self.read_image(image_path)
        print('Converting from',rgb_image.shape)

        dsm_image = self.read_image(dsm_image_path)
        # TODO To optimize training slightly, this can be done as a pre-processing step instead of during training
        # resize dsm image to rgb image if sizes are different
        if dsm_image.shape[:2] != rgb_image.shape[:2]:
            rgb_size = tuple(rgb_image.shape[:2])
            print('resizing', dsm_image.shape, 'to', rgb_image.shape)
            dsm_image = np.array(Image.fromarray(dsm_image).resize(rgb_size)) # resize (bicubic interpol.), change back to (H,W,1) dims
            dsm_image = dsm_image[:,:, None]
            print('dsm', dsm_image.shape)
        # print(rgb_image.shape)
        image_full = np.dstack([rgb_image, self.normalize_image(dsm_image)])
        print(image_full.shape)
        # raise(Exception)
        # read data sample
        sample = dict(
            id=id,
            image=image_full,
            mask=self.read_mask(mask_path),
        )
        if loss_mask_path:
            sample['loss_mask'] = self.read_mask(loss_mask_path)
            sample["loss_mask"] = sample["loss_mask"][None] # expand first dim for loss mask

        # apply augmentations, loss_mask is also augmented. May want to change that
        if self.transform is not None:
            sample = self.transform(**sample)

        # print(image_full.shape)
        sample["mask"] = sample["mask"][None]  # expand first dim for mask, ex size(3, 768, 768) -> size(3, 1, 768, 768)
        return sample

It might be easier to debug to raise an assert and only print when the output doesn’t have 4 channels in the Dataset so you can exactly which rgb and depth image is causing the issue.

the assertions never get hit (inside the for loop). Not sure how this is happening. Any more tips @ptrblck ?

That’s strange indeed. If I understand you correctly you are adding an assert statement inside the DataLoader loop with a single sample per batch to check for 4 channels in each input.
However, this assert is never raised and instead the conv layer raises an assert claiming that the input has 3 channels, while 4 are expected?
In that case it seems that the channel might have been removed while passing it to the model, so I would check the forward method and see, if the input is sliced there.

Actually, the assert is raised. I checked the image (numpy array) coming from the dataset class is indeed 4 channels every time. It seems that when the dataloader is getting the image, sometimes it converts it into a 3 channel?

I narrowed it down to it being albementations that converts it to 3 channels ‘sometimes’. Still can’t pinpoint why its doing it.

I would then try to use the already mentioned approach of isolating the Dataset index, which is running into the assert and check how albumentations is transforming this particular sample.

1 Like

I was able to narrow it down to the HSV shift transform causing it. This error below if anyone is curious. The version of albumentations I was previously using (0.4.3) did not have tracebacks for it and just skipped the channel it had a problem with ( I think). The latest version (1.0.0) is much more verbose with a traceback clearly showing shift_hsv was causing it.

Thanks for the help @ptrblck

return F."shift_hsv"(image, hue_shift, sat_shift, val_shift) File, line 55, in wrapped_function result = result.reshape(shape) 
ValueError: cannot reshape array of size 1769472 into shape (768,768,4)