Invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 4 in dimension 1

bhaktatejas922 · June 14, 2021, 2:31am

Hey,
I’m using the segmentation models pytorch repo (GitHub - qubvel/segmentation_models.pytorch: Segmentation models with pretrained backbones. PyTorch.), and I’m trying to use a 4 channel input image instead of 3 (which most of the code I’m using seems to expect). I’m getting this issue and I’m stumped on how to debug it. Anyone have any ideas? Anywhere I should go print tensor shapes?

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 188, in <module>
    main(cfg)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 176, in main
    **cfg.training.fit,
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 257, in fit
    for i, batch in enumerate(train_dataloader):
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 801, in __next__
    return self._process_data(data)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 846, in _process_data
    data.reraise()
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/_utils.py", line 369, in reraise
    raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in DataLoader worker process 1.
Original Traceback (most recent call last):
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 178, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 47, in fetch
    return self.collate_fn(data)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 75, in default_collate
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 75, in <dictcomp>
    return {key: default_collate([d[key] for d in batch]) for key in elem}
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 65, in default_collate
    return default_collate([torch.as_tensor(b) for b in batch])
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/utils/data/_utils/collate.py", line 56, in default_collate
    return torch.stack(batch, 0, out=out)
RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 3 and 4 in dimension 1 at /pytorch/aten/src/TH/generic/THTensor.cpp:689

ptrblck · June 14, 2021, 3:45am

Based on the stacktrace it seems that the DataLoader is raising this error while trying to stack the image tensors to a batch, as some tensors seems to use 3 channels while others 4.
Did you make sure that each image has indeed 4 channels? You could set the batch_size to 1, iterate the DataLoader once for an entire epoch and check the number of channels of each image.

bhaktatejas922 · June 14, 2021, 8:33am

Looking at this further, I feel like I’m getting some strange dataloader behavior. In my Dataset class in my getitem I am stacking my 4th dim with np.dstack(depth image) on my RGB image to get a shape of (H,W,4) . The depth images definitely exist, but for some reason I end up with a tensor with 3 channels instead of 4 shortly after starting training. I get no error from the dstack that adds the 4th channel, it just is somehow 3 channels.

python3.6, torch 1.2.0

    def __getitem__(self, i):
        id = self.ids[i]
        image_path = os.path.join(self.images_dir, id)
        dsm_image_path = os.path.join(self.dsm_images_dir, id)
        mask_path = os.path.join(self.masks_dir, id)
        loss_mask_path = os.path.join(self.loss_masks_dir, id) if self.loss_masks_dir else None # loss masks should have same filename as og image

        print(id)
        # concat 3 RGB and 1 Depth channel
        rgb_image = self.read_image(image_path)
        dsm_image = self.read_image(dsm_image_path)
        # resize dsm image to rgb image if sizes are different
      
        image_full = np.dstack([rgb_image, self.normalize_image(dsm_image)])

torch.Size([1, 4, 768, 768])
11946.tiff
train:   0%|                                                                               | 1/63647 [00:08<152:35:49,  8.63s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]torch.Size([1, 4, 768, 768])
26528.tiff
train:   0%|                                                                                | 2/63647 [00:08<64:36:32,  3.65s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]torch.Size([1, 4, 768, 768])
100243.tiff
train:   0%|                                                                                | 3/63647 [00:08<36:17:29,  2.05s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]torch.Size([1, 3, 768, 768])
102467.tiff
train:   0%|                                                                                | 3/63647 [00:09<53:39:15,  3.03s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 188, in <module>
    main(cfg)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 176, in main
    **cfg.training.fit,
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 266, in fit
    output = self._feed_batch(batch)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 35, in wrapped
    res = f(*args, **kwargs)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 153, in _feed_batch
    output = self.model(*input)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/base/model.py", line 15, in forward
    features = self.encoder(x)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/encoders/efficientnet.py", line 50, in forward
    x = self._swish(self._bn0(self._conv_stem(x)))
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/efficientnet_pytorch/utils.py", line 271, in forward
    x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 48 4 3 3, expected input[1, 3, 770, 770] to have 4 channels, but got 3 channels instead

bhaktatejas922 · June 14, 2021, 8:36am

Using batch size 1, num workers 1 (to try to debug this)

train:   0%|                                                                                        | 0/63647 [00:00<?, ?it/s]
torch.Size([1, 4, 768, 768])
['44873.tiff']
train:   0%|                   | 1/63647 [00:09<164:02:23,  9.28s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]
torch.Size([1, 4, 768, 768])
['31376.tiff']
train:   0%|                    | 2/63647 [00:09<69:21:28,  3.92s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]
torch.Size([1, 3, 768, 768])
['5430.tiff']
train:   0%|                    | 2/63647 [00:09<84:31:52,  4.78s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0000]
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 188, in <module>
    main(cfg)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 176, in main
    **cfg.training.fit,
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 267, in fit
    output = self._feed_batch(batch)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 35, in wrapped
    res = f(*args, **kwargs)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 153, in _feed_batch
    output = self.model(*input)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/base/model.py", line 15, in forward
    features = self.encoder(x)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/encoders/efficientnet.py", line 50, in forward
    x = self._swish(self._bn0(self._conv_stem(x)))
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/efficientnet_pytorch/utils.py", line 271, in forward
    x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 48 4 3 3, expected input[1, 3, 770, 770] to have 4 channels, but got 3 channels instead

ptrblck · June 14, 2021, 8:38am

Disable the shuffling in the DataLoader (if not already done) and print the batch index to further isolate the problematic sample (also keep batch_size=1).
With this index you can directly index the Dataset and check why this sample has 3 channels only.

bhaktatejas922 · June 14, 2021, 6:06pm

None of the images that I got printed are actually that bad size (768x768)

train:   0%|                                                                 | 0/63647 [00:00<?, ?it/s]
Converting from (206, 206, 3)
(206, 206, 4)
Converting from (250, 251, 3)
(250, 251, 4)
torch.Size([1, 4, 768, 768])
['78703.tiff']
batch index 0
Converting from (191, 191, 3)
(191, 191, 4)
train:   0%| | 1/63647 [00:07<135:26:18,  7.66s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.
torch.Size([1, 4, 768, 768])
['28456.tiff']
batch index 1
Converting from (228, 228, 3)
(228, 228, 4)
train:   0%| | 2/63647 [00:07<59:14:39,  3.35s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0
torch.Size([1, 3, 768, 768])
['79771.tiff']
batch index 2
train:   0%| | 2/63647 [00:08<78:48:11,  4.46s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0
Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 188, in <module>
    main(cfg)
  File "/home/bhaktatejas922/internal-geometry-ml/src/train.py", line 176, in main
    **cfg.training.fit,
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 268, in fit
    output = self._feed_batch(batch)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 35, in wrapped
    res = f(*args, **kwargs)
  File "/home/bhaktatejas922/internal-geometry-ml/src/training/runner.py", line 153, in _feed_batch
    output = self.model(*input)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/base/model.py", line 15, in forward
    features = self.encoder(x)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/segmentation_models_pytorch/encoders/efficientnet.py", line 50, in forward
    x = self._swish(self._bn0(self._conv_stem(x)))
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 547, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/bhaktatejas922/.local/lib/python3.6/site-packages/efficientnet_pytorch/utils.py", line 271, in forward
    x = F.conv2d(x, self.weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size 48 4 3 3, expected input[1, 3, 770, 770] to have 4 channels, but got 3 channels instead

Screenshot from 2021-06-14 11-06-01

eqy · June 14, 2021, 6:26pm

In the output it looks like one of the images is actually 3 channels instead of 4:

train:   0%| | 2/63647 [00:07<59:14:39,  3.35s/it, loss_mask: 1.0000, loss: 1.0000, mask_micro_iou: 0.0
torch.Size([1, 3, 768, 768])
['79771.tiff']

bhaktatejas922 · June 14, 2021, 6:53pm

Indeed, but I’m not sure how or why. The code I have that adds the 4th channel does not ever throw any errors. And the batch ids I print out are all valid, not related to the offending tensor

eqy · June 14, 2021, 7:14pm

This could be a case where the actual image file as advertised doesn’t agree with what the dataset claims; you might want to check if the input image in this case is actually a 4 channel TIFF image. For example, I believe some images claimed to be “jpegs” in ImageNet are actually PNGs!

bhaktatejas922 · June 14, 2021, 7:22pm

I have a dataset of RGB tiffs (3 channel) and a dataset of depth tiffs (1 channel) I am dstack-ing them in the getitem method of my dataset class as shown here

    def __getitem__(self, i):
        id = self.ids[i]
        image_path = os.path.join(self.images_dir, id)
        dsm_image_path = os.path.join(self.dsm_images_dir, id)
        mask_path = os.path.join(self.masks_dir, id)
        loss_mask_path = os.path.join(self.loss_masks_dir, id) if self.loss_masks_dir else None # loss masks should have same filename as og image

        # print(id)
        # concat 3 RGB and 1 Depth channel
        rgb_image = self.read_image(image_path)
        print('Converting from',rgb_image.shape)

        dsm_image = self.read_image(dsm_image_path)
        # TODO To optimize training slightly, this can be done as a pre-processing step instead of during training
        # resize dsm image to rgb image if sizes are different
        if dsm_image.shape[:2] != rgb_image.shape[:2]:
            rgb_size = tuple(rgb_image.shape[:2])
            print('resizing', dsm_image.shape, 'to', rgb_image.shape)
            dsm_image = np.array(Image.fromarray(dsm_image).resize(rgb_size)) # resize (bicubic interpol.), change back to (H,W,1) dims
            dsm_image = dsm_image[:,:, None]
            print('dsm', dsm_image.shape)
        # print(rgb_image.shape)
        image_full = np.dstack([rgb_image, self.normalize_image(dsm_image)])
        print(image_full.shape)
        # raise(Exception)
        # read data sample
        sample = dict(
            id=id,
            image=image_full,
            mask=self.read_mask(mask_path),
        )
        if loss_mask_path:
            sample['loss_mask'] = self.read_mask(loss_mask_path)
            sample["loss_mask"] = sample["loss_mask"][None] # expand first dim for loss mask

        # apply augmentations, loss_mask is also augmented. May want to change that
        if self.transform is not None:
            sample = self.transform(**sample)

        # print(image_full.shape)
        sample["mask"] = sample["mask"][None]  # expand first dim for mask, ex size(3, 768, 768) -> size(3, 1, 768, 768)
        return sample

eqy · June 14, 2021, 7:28pm

It might be easier to debug to raise an assert and only print when the output doesn’t have 4 channels in the Dataset so you can exactly which rgb and depth image is causing the issue.

bhaktatejas922 · June 14, 2021, 8:12pm

the assertions never get hit (inside the for loop). Not sure how this is happening. Any more tips @ptrblck ?

ptrblck · June 15, 2021, 3:03am

That’s strange indeed. If I understand you correctly you are adding an assert statement inside the DataLoader loop with a single sample per batch to check for 4 channels in each input.
However, this assert is never raised and instead the conv layer raises an assert claiming that the input has 3 channels, while 4 are expected?
In that case it seems that the channel might have been removed while passing it to the model, so I would check the forward method and see, if the input is sliced there.

bhaktatejas922 · June 27, 2021, 7:44pm

Actually, the assert is raised. I checked the image (numpy array) coming from the dataset class is indeed 4 channels every time. It seems that when the dataloader is getting the image, sometimes it converts it into a 3 channel?

I narrowed it down to it being albementations that converts it to 3 channels ‘sometimes’. Still can’t pinpoint why its doing it.

ptrblck · June 28, 2021, 7:05am

I would then try to use the already mentioned approach of isolating the Dataset index, which is running into the assert and check how albumentations is transforming this particular sample.

bhaktatejas922 · June 28, 2021, 5:27pm

I was able to narrow it down to the HSV shift transform causing it. This error below if anyone is curious. The version of albumentations I was previously using (0.4.3) did not have tracebacks for it and just skipped the channel it had a problem with ( I think). The latest version (1.0.0) is much more verbose with a traceback clearly showing shift_hsv was causing it.

Thanks for the help @ptrblck

return F."shift_hsv"(image, hue_shift, sat_shift, val_shift) File, line 55, in wrapped_function result = result.reshape(shape) 
ValueError: cannot reshape array of size 1769472 into shape (768,768,4)