RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[4, 6, 128, 128] to have 3 channels, but got 6 channels instead

123456 · November 9, 2023, 3:32pm

I am training pretrained deeplabv3 model with resnet101, and got the following error

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[4, 6, 128, 128] to have 3 channels, but got 6 channels instead.

Actually I need to send two concatenated images to architecture. Concatenated along channel dimension.

I have only two classes and done like this

model = deeplabv3_resnet101(pretrained=True)
num_classes = len(classes)  # Number of segmentation classes

# Modify the final classification layer 
model.classifier[4] = nn.Conv2d(256, num_classes, kernel_size=(1, 1))

and got the following error

RuntimeError                              Traceback (most recent call last)
<ipython-input-35-87c7cdd319e7> in <cell line: 64>()
     81         # Forward pass
     82         # outputs = model(images)['out']
---> 83         outputs =model(images)
     84 
     85         # Calculate the loss


9 frames


/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
    454                             weight, bias, self.stride,
    455                             _pair(0), self.dilation, self.groups)
--> 456         return F.conv2d(input, weight, bias, self.stride,
    457                         self.padding, self.dilation, self.groups)
    458 

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[4, 6, 128, 128] to have 3 channels, but got 6 channels instead

How can I modify 1st conv layer of deeplabv3 architecture take the input as 6 channels and not 3.

Please help

ptrblck · November 9, 2023, 6:43pm

You can use the same approach you are already using to replace the classifier.
Instead of replacing .classifier[4]:

model.classifier[4] = nn.Conv2d(256, num_classes, kernel_size=(1, 1))

use:

model = models.segmentation.deeplabv3_resnet101()
model.backbone.conv1 = nn.Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

x = torch.randn(4, 6, 128, 128)
out = model(x)

123456 · November 9, 2023, 7:22pm

Okay Sir. Thankyou
I have done it like this below. Please tell Is this way right or wrong?

new_conv1 = nn.Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
model.backbone.conv1 = new_conv1

ptrblck · November 9, 2023, 7:23pm

Your code looks correct and is doing the same as my code snippet.

123456 · November 9, 2023, 7:24pm

Yes Sir.
I haven’t used x = torch.randn(4, 6, 128, 128), but still it is going fine

123456 · November 12, 2023, 8:21am

Sir, when I am printing the summary of this model, it again shows the size mismatch error

model = deeplabv3_resnet101(num_classes=2)
summary(model, input_size=(batch_size,6,128,128))

The error is

RuntimeError: Error(s) in loading state_dict for DeepLabV3:
	Unexpected key(s) in state_dict: "aux_classifier.0.weight", "aux_classifier.1.weight", "aux_classifier.1.bias", "aux_classifier.1.running_mean", "aux_classifier.1.running_var", "aux_classifier.1.num_batches_tracked", "aux_classifier.4.weight", "aux_classifier.4.bias". 
	size mismatch for backbone.conv1.weight: copying a param with shape torch.Size([64, 6, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 3, 7, 7]).

ptrblck · November 12, 2023, 3:12pm

Yes, because you are not replacing the first linear layer as shown in my example in your latest code snippet.

123456 · November 13, 2023, 2:24pm

Sir one is the final layer we need to change for classes, and one is the 1st layer to have 6 channels

images=torch.cat((input1,input2),1)
model = deeplabv3_resnet101(pretrained=True)
num_classes = len(classes)  # Number of segmentation classes

# 1st layer
model.backbone.conv1 = nn.Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

# final classification layer
model.classifier[4] = nn.Conv2d(256, num_classes, kernel_size=(1, 1))

outputs =model(images)['out']

It is taking the images as a tensor concatenated along channel dimension. If I use torch.rand(4,6,128,128), then will it not be fixed to this input size only? because at the time of testing, we will pass variable size input

Also I tried using it, but giving error

images=torch.rand(batch_size,6,128,128)
        outputs =model(images)['out']

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

ptrblck · November 13, 2023, 2:30pm

Sorry, yes I meant the first conv layer in my previous post.
If you don’t replace it you will continue running into the same error explaining the model expects inputs with 3 channels.

It depends on the model architecture and dimension. The channel dimension will be fixed, the batch dimension is always variable (unless you hard-code it into the model, which I would consider a bug), and the spatial dimensions could be variable depending on the model architecture.

123456:

Also I tried using it, but giving error
images=torch.rand(batch_size,6,128,128)
        outputs =model(images)['out']
RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor)

Check the error message and make sure the input is moved to the GPU before being passed to the model.

123456 · November 13, 2023, 2:42pm

Yes Sir I replaced it as I shown in above code snippet, 1st layer and final layer.
and yes I cross checked that input is moved to GPU before being passed.

The error comes when I use torch.rand …, otherwise training smoothy gets done,
images=torch.rand(batch_size,6,128,128)
outputs =model(images)[‘out’]

After training problem comes at printing summary which shows error mismatch

ptrblck · November 13, 2023, 3:05pm

Exactly, because images is not on the GPU but is still on the host. Move it to the the GPU and it should work.

123456 · November 13, 2023, 5:56pm

I have moved the images to tensor, now training part runs. Now the next is summary. I am not able to locate the error

Code is

import torchinfo
from torchinfo import summary
from torchvision.models.segmentation import deeplabv3_resnet101
# model = deeplabv3_resnet101(num_classes=2)
# summary(model, input_size=(batch_size,3,128,128))
model = deeplabv3_resnet101(num_classes=2)

# summary(model)
# summary(model, input_size=(6,batch_size,128,128))

summary(model, input_size=(batch_size,6,128,128))

# summary(model)

Error:

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[4, 6, 128, 128] to have 3 channels, but got 6 channels instead

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/torchinfo/torchinfo.py in forward_pass(model, x, batch_dim, cache_forward_pass, device, mode, **kwargs)
    302     except Exception as e:
    303         executed_layers = [layer for layer in summary_list if layer.executed]
--> 304         raise RuntimeError(
    305             "Failed to run torchinfo. See above stack traces for more details. "
    306             f"Executed layers up to: {executed_layers}"

RuntimeError: Failed to run torchinfo. See above stack traces for more details. Executed layers up to: []

ptrblck · November 13, 2023, 6:16pm

You are again missing the manipulation of the first conv layer:

model = deeplabv3_resnet101(num_classes=2)
summary(model, input_size=(batch_size,6,128,128))

123456 · November 13, 2023, 6:53pm

Yes Sir Sorry, Actually I was experimenting something, so I comment it out.

But the problem still persists as shown above.

ptrblck · November 13, 2023, 6:55pm

No, it doesn’t:

import torchinfo
from torchinfo import summary
from torchvision.models.segmentation import deeplabv3_resnet101

model = deeplabv3_resnet101(num_classes=2)
model.backbone.conv1 = nn.Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

summary(model, input_size=(1,6,128,128))
# ====================================================================================================
# Layer (type:depth-idx)                             Output Shape              Param #
# ====================================================================================================
# DeepLabV3                                          [1, 2, 128, 128]          --
# ├─IntermediateLayerGetter: 1-1                     [1, 2048, 16, 16]         --
# │    └─Conv2d: 2-1                                 [1, 64, 64, 64]           18,816
# │    └─BatchNorm2d: 2-2                            [1, 64, 64, 64]           128
# │    └─ReLU: 2-3                                   [1, 64, 64, 64]           --
# │    └─MaxPool2d: 2-4                              [1, 64, 32, 32]           --
# │    └─Sequential: 2-5                             [1, 256, 32, 32]          --
# │    │    └─Bottleneck: 3-1                        [1, 256, 32, 32]          75,008
# │    │    └─Bottleneck: 3-2                        [1, 256, 32, 32]          70,400
# │    │    └─Bottleneck: 3-3                        [1, 256, 32, 32]          70,400
# │    └─Sequential: 2-6                             [1, 512, 16, 16]          --
# │    │    └─Bottleneck: 3-4                        [1, 512, 16, 16]          379,392
# │    │    └─Bottleneck: 3-5                        [1, 512, 16, 16]          280,064
# │    │    └─Bottleneck: 3-6                        [1, 512, 16, 16]          280,064
# │    │    └─Bottleneck: 3-7                        [1, 512, 16, 16]          280,064
# │    └─Sequential: 2-7                             [1, 1024, 16, 16]         --
# │    │    └─Bottleneck: 3-8                        [1, 1024, 16, 16]         1,512,448
# │    │    └─Bottleneck: 3-9                        [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-10                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-11                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-12                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-13                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-14                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-15                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-16                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-17                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-18                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-19                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-20                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-21                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-22                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-23                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-24                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-25                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-26                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-27                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-28                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-29                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-30                       [1, 1024, 16, 16]         1,117,184
# │    └─Sequential: 2-8                             [1, 2048, 16, 16]         --
# │    │    └─Bottleneck: 3-31                       [1, 2048, 16, 16]         6,039,552
# │    │    └─Bottleneck: 3-32                       [1, 2048, 16, 16]         4,462,592
# │    │    └─Bottleneck: 3-33                       [1, 2048, 16, 16]         4,462,592
# ├─DeepLabHead: 1-2                                 [1, 2, 16, 16]            --
# │    └─ASPP: 2-9                                   [1, 256, 16, 16]          --
# │    │    └─ModuleList: 3-34                       --                        15,206,912
# │    │    └─Sequential: 3-35                       [1, 256, 16, 16]          328,192
# │    └─Conv2d: 2-10                                [1, 256, 16, 16]          589,824
# │    └─BatchNorm2d: 2-11                           [1, 256, 16, 16]          512
# │    └─ReLU: 2-12                                  [1, 256, 16, 16]          --
# │    └─Conv2d: 2-13                                [1, 2, 16, 16]            514
# ====================================================================================================
# Total params: 58,635,522
# Trainable params: 58,635,522
# Non-trainable params: 0
# Total mult-adds (G): 15.11
# ====================================================================================================
# Input size (MB): 0.39
# Forward/backward pass size (MB): 244.85
# Params size (MB): 234.54
# Estimated Total Size (MB): 479.79
# ====================================================================================================