RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[4, 6, 128, 128] to have 3 channels, but got 6 channels instead

I am training pretrained deeplabv3 model with resnet101, and got the following error

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[4, 6, 128, 128] to have 3 channels, but got 6 channels instead.

Actually I need to send two concatenated images to architecture. Concatenated along channel dimension.

I have only two classes and done like this

model = deeplabv3_resnet101(pretrained=True)
num_classes = len(classes)  # Number of segmentation classes

# Modify the final classification layer 
model.classifier[4] = nn.Conv2d(256, num_classes, kernel_size=(1, 1))

and got the following error

RuntimeError                              Traceback (most recent call last)
<ipython-input-35-87c7cdd319e7> in <cell line: 64>()
     81         # Forward pass
     82         # outputs = model(images)['out']
---> 83         outputs =model(images)
     84 
     85         # Calculate the loss


9 frames


/usr/local/lib/python3.10/dist-packages/torch/nn/modules/conv.py in _conv_forward(self, input, weight, bias)
    454                             weight, bias, self.stride,
    455                             _pair(0), self.dilation, self.groups)
--> 456         return F.conv2d(input, weight, bias, self.stride,
    457                         self.padding, self.dilation, self.groups)
    458 

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[4, 6, 128, 128] to have 3 channels, but got 6 channels instead

How can I modify 1st conv layer of deeplabv3 architecture take the input as 6 channels and not 3.

Please help

You can use the same approach you are already using to replace the classifier.
Instead of replacing .classifier[4]:

model.classifier[4] = nn.Conv2d(256, num_classes, kernel_size=(1, 1))

use:

model = models.segmentation.deeplabv3_resnet101()
model.backbone.conv1 = nn.Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

x = torch.randn(4, 6, 128, 128)
out = model(x)

Okay Sir. Thankyou
I have done it like this below. Please tell Is this way right or wrong?

new_conv1 = nn.Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
model.backbone.conv1 = new_conv1

Your code looks correct and is doing the same as my code snippet.

Yes Sir. :slight_smile:
I haven’t used x = torch.randn(4, 6, 128, 128), but still it is going fine

Sir, when I am printing the summary of this model, it again shows the size mismatch error

model = deeplabv3_resnet101(num_classes=2)
summary(model, input_size=(batch_size,6,128,128))

The error is

RuntimeError: Error(s) in loading state_dict for DeepLabV3:
	Unexpected key(s) in state_dict: "aux_classifier.0.weight", "aux_classifier.1.weight", "aux_classifier.1.bias", "aux_classifier.1.running_mean", "aux_classifier.1.running_var", "aux_classifier.1.num_batches_tracked", "aux_classifier.4.weight", "aux_classifier.4.bias". 
	size mismatch for backbone.conv1.weight: copying a param with shape torch.Size([64, 6, 7, 7]) from checkpoint, the shape in current model is torch.Size([64, 3, 7, 7]).

Yes, because you are not replacing the first linear layer as shown in my example in your latest code snippet.

Sir one is the final layer we need to change for classes, and one is the 1st layer to have 6 channels

images=torch.cat((input1,input2),1)
model = deeplabv3_resnet101(pretrained=True)
num_classes = len(classes)  # Number of segmentation classes

# 1st layer
model.backbone.conv1 = nn.Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

# final classification layer
model.classifier[4] = nn.Conv2d(256, num_classes, kernel_size=(1, 1))

outputs =model(images)['out']

It is taking the images as a tensor concatenated along channel dimension. If I use torch.rand(4,6,128,128), then will it not be fixed to this input size only? because at the time of testing, we will pass variable size input

Also I tried using it, but giving error

images=torch.rand(batch_size,6,128,128)
        outputs =model(images)['out']

RuntimeError: Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same or input should be a MKLDNN tensor and weight is a dense tensor

Sorry, yes I meant the first conv layer in my previous post.
If you don’t replace it you will continue running into the same error explaining the model expects inputs with 3 channels.

It depends on the model architecture and dimension. The channel dimension will be fixed, the batch dimension is always variable (unless you hard-code it into the model, which I would consider a bug), and the spatial dimensions could be variable depending on the model architecture.

Check the error message and make sure the input is moved to the GPU before being passed to the model.

Yes Sir I replaced it as I shown in above code snippet, 1st layer and final layer.
and yes I cross checked that input is moved to GPU before being passed.

The error comes when I use torch.rand …, otherwise training smoothy gets done,
images=torch.rand(batch_size,6,128,128)
outputs =model(images)[‘out’]

After training problem comes at printing summary which shows error mismatch

Exactly, because images is not on the GPU but is still on the host. Move it to the the GPU and it should work.

I have moved the images to tensor, now training part runs. Now the next is summary. I am not able to locate the error

Code is

import torchinfo
from torchinfo import summary
from torchvision.models.segmentation import deeplabv3_resnet101
# model = deeplabv3_resnet101(num_classes=2)
# summary(model, input_size=(batch_size,3,128,128))
model = deeplabv3_resnet101(num_classes=2)

# summary(model)
# summary(model, input_size=(6,batch_size,128,128))

summary(model, input_size=(batch_size,6,128,128))

# summary(model)

Error:

RuntimeError: Given groups=1, weight of size [64, 3, 7, 7], expected input[4, 6, 128, 128] to have 3 channels, but got 6 channels instead

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.10/dist-packages/torchinfo/torchinfo.py in forward_pass(model, x, batch_dim, cache_forward_pass, device, mode, **kwargs)
    302     except Exception as e:
    303         executed_layers = [layer for layer in summary_list if layer.executed]
--> 304         raise RuntimeError(
    305             "Failed to run torchinfo. See above stack traces for more details. "
    306             f"Executed layers up to: {executed_layers}"

RuntimeError: Failed to run torchinfo. See above stack traces for more details. Executed layers up to: []

You are again missing the manipulation of the first conv layer:

model = deeplabv3_resnet101(num_classes=2)
summary(model, input_size=(batch_size,6,128,128))

Yes Sir Sorry, Actually I was experimenting something, so I comment it out.

But the problem still persists as shown above.

No, it doesn’t:

import torchinfo
from torchinfo import summary
from torchvision.models.segmentation import deeplabv3_resnet101

model = deeplabv3_resnet101(num_classes=2)
model.backbone.conv1 = nn.Conv2d(6, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)

summary(model, input_size=(1,6,128,128))
# ====================================================================================================
# Layer (type:depth-idx)                             Output Shape              Param #
# ====================================================================================================
# DeepLabV3                                          [1, 2, 128, 128]          --
# ├─IntermediateLayerGetter: 1-1                     [1, 2048, 16, 16]         --
# │    └─Conv2d: 2-1                                 [1, 64, 64, 64]           18,816
# │    └─BatchNorm2d: 2-2                            [1, 64, 64, 64]           128
# │    └─ReLU: 2-3                                   [1, 64, 64, 64]           --
# │    └─MaxPool2d: 2-4                              [1, 64, 32, 32]           --
# │    └─Sequential: 2-5                             [1, 256, 32, 32]          --
# │    │    └─Bottleneck: 3-1                        [1, 256, 32, 32]          75,008
# │    │    └─Bottleneck: 3-2                        [1, 256, 32, 32]          70,400
# │    │    └─Bottleneck: 3-3                        [1, 256, 32, 32]          70,400
# │    └─Sequential: 2-6                             [1, 512, 16, 16]          --
# │    │    └─Bottleneck: 3-4                        [1, 512, 16, 16]          379,392
# │    │    └─Bottleneck: 3-5                        [1, 512, 16, 16]          280,064
# │    │    └─Bottleneck: 3-6                        [1, 512, 16, 16]          280,064
# │    │    └─Bottleneck: 3-7                        [1, 512, 16, 16]          280,064
# │    └─Sequential: 2-7                             [1, 1024, 16, 16]         --
# │    │    └─Bottleneck: 3-8                        [1, 1024, 16, 16]         1,512,448
# │    │    └─Bottleneck: 3-9                        [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-10                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-11                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-12                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-13                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-14                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-15                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-16                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-17                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-18                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-19                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-20                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-21                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-22                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-23                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-24                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-25                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-26                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-27                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-28                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-29                       [1, 1024, 16, 16]         1,117,184
# │    │    └─Bottleneck: 3-30                       [1, 1024, 16, 16]         1,117,184
# │    └─Sequential: 2-8                             [1, 2048, 16, 16]         --
# │    │    └─Bottleneck: 3-31                       [1, 2048, 16, 16]         6,039,552
# │    │    └─Bottleneck: 3-32                       [1, 2048, 16, 16]         4,462,592
# │    │    └─Bottleneck: 3-33                       [1, 2048, 16, 16]         4,462,592
# ├─DeepLabHead: 1-2                                 [1, 2, 16, 16]            --
# │    └─ASPP: 2-9                                   [1, 256, 16, 16]          --
# │    │    └─ModuleList: 3-34                       --                        15,206,912
# │    │    └─Sequential: 3-35                       [1, 256, 16, 16]          328,192
# │    └─Conv2d: 2-10                                [1, 256, 16, 16]          589,824
# │    └─BatchNorm2d: 2-11                           [1, 256, 16, 16]          512
# │    └─ReLU: 2-12                                  [1, 256, 16, 16]          --
# │    └─Conv2d: 2-13                                [1, 2, 16, 16]            514
# ====================================================================================================
# Total params: 58,635,522
# Trainable params: 58,635,522
# Non-trainable params: 0
# Total mult-adds (G): 15.11
# ====================================================================================================
# Input size (MB): 0.39
# Forward/backward pass size (MB): 244.85
# Params size (MB): 234.54
# Estimated Total Size (MB): 479.79
# ====================================================================================================