RuntimeError: The size of tensor a (80) must match the size of tensor b (95) at non-singleton dimension 2

Hello, I would like to understand this issue and looking for a solution to fix it. I am trying to use U-Net architecture for my project and I am getting this RuntimeError error when estimating MSE between the target and the prediction.

My input tensor have a shape of [1, 3, 95, 64] but when it pass through the Network, I obtained a tensor of [1, 3, 80, 64]. What can be reason of this issue.

Here is my Network:

Sequential(
(0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace=True)
(3): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(4): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace=True)
(6): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(7): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(8): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(9): ReLU(inplace=True)
(10): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(11): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(12): ReLU(inplace=True)
(13): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(14): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(15): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(16): ReLU(inplace=True)
(17): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(18): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(19): ReLU(inplace=True)
(20): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(21): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(22): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(23): ReLU(inplace=True)
(24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(25): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(26): ReLU(inplace=True)
(27): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
(28): Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(29): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(30): ReLU(inplace=True)
(31): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(32): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(33): ReLU(inplace=True)
(34): ConvTranspose2d(1024, 1024, kernel_size=(2, 2), stride=(2, 2))
(35): Conv2d(1024, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(36): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(37): ReLU(inplace=True)
(38): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(39): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(40): ReLU(inplace=True)
(41): ConvTranspose2d(512, 512, kernel_size=(2, 2), stride=(2, 2))
(42): Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(43): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(44): ReLU(inplace=True)
(45): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(46): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(47): ReLU(inplace=True)
(48): ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2))
(49): Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(50): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(51): ReLU(inplace=True)
(52): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(53): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(54): ReLU(inplace=True)
(55): ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2))
(56): Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(57): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(58): ReLU(inplace=True)
(59): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(60): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(61): ReLU(inplace=True)
(62): Conv2d(64, 3, kernel_size=(1, 1), stride=(1, 1))
)

sorry, i misread the question. Just padding your input to 96,64 that may work

I added a padding as it is shown in architecture. Or it is not the correct way to do it ? I will appreciate if you can help me to add the pad correctly

Actually, I feel confused about that you used an input with the shape [95,64] and successfully got the output. There should be a RuntimeError in forward function. You can see and use my example code.

import torch
import torch.nn as nn

class DoubleConv(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.conv(x)

class UNet(nn.Module):
    def __init__(self, in_channels=3, out_channels=1):
        super().__init__()
        
        # encoder part
        self.enc1 = DoubleConv(in_channels, 64)
        self.enc2 = DoubleConv(64, 128)
        self.enc3 = DoubleConv(128, 256)
        self.enc4 = DoubleConv(256, 512)
        
        # polling layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # bottle neck
        self.bottleneck = DoubleConv(512, 1024)
        
        # up sampling part
        self.up1 = nn.ConvTranspose2d(1024, 512, kernel_size=2, stride=2)
        self.up2 = nn.ConvTranspose2d(512, 256, kernel_size=2, stride=2)
        self.up3 = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
        self.up4 = nn.ConvTranspose2d(128, 64, kernel_size=2, stride=2)
        
        # decoder part
        self.dec1 = DoubleConv(1024, 512)
        self.dec2 = DoubleConv(512, 256)
        self.dec3 = DoubleConv(256, 128)
        self.dec4 = DoubleConv(128, 64)
        
        # output layer
        self.out = nn.Conv2d(64, out_channels, kernel_size=1)

    def forward(self, x):
        # encoder
        enc1 = self.enc1(x)
        enc2 = self.enc2(self.pool(enc1))
        enc3 = self.enc3(self.pool(enc2))
        enc4 = self.enc4(self.pool(enc3))
        
        # bottleneck
        bottleneck = self.bottleneck(self.pool(enc4))
        
        # decoder
        dec1 = self.dec1(torch.cat([self.up1(bottleneck), enc4], dim=1))
        dec2 = self.dec2(torch.cat([self.up2(dec1), enc3], dim=1))
        dec3 = self.dec3(torch.cat([self.up3(dec2), enc2], dim=1))
        dec4 = self.dec4(torch.cat([self.up4(dec3), enc1], dim=1))
        
        return self.out(dec4)


if __name__ == "__main__":
    # create a model
    model = UNet(in_channels=3, out_channels=1)
    print("Encoding path:")
    print("1. enc1:\n", model.enc1)
    print("2. pool + enc2:\n", model.pool,model.enc2)
    print("3. pool + enc3:\n", model.pool,model.enc3) 
    print("4. pool + enc4:\n", model.pool,model.enc4)

    print("\nBottleneck:\n")
    print("5. pool + bottleneck:\n", model.bottleneck)

    print("\nDecoding path:\n") 
    print("6. up1 + concat + dec1\n", model.up1, model.dec1)
    print("7. up2 + concat + dec2\n", model.up2, model.dec2)
    print("8. up3 + concat + dec3\n", model.up3, model.dec3)
    print("9. up4 + concat + dec4\n", model.up4, model.dec4)

    print("\nOutput:")
    print("10. out:", model.out)
    # random input
    x = torch.randn(1, 3, 96, 64)
    
    # forward pass
    output = model(x)
    
    print(f"Input shape: {x.shape}")
    print(f"Output shape: {output.shape}")

In this code, i build a same u-net with [96,64] input, i get the correct answer.

Also a strange thing in your code is: ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2)).

In U-Net, every upsample should half the channel number, because the upsample tensor should concatenate the same encoder tensor which is shown in U-Net architecture


that grey arrow shows the concatenate action.

I think this is the reason why you can get ouput from strange shape input.

sorry about my poor english :frowning:

No problem I understood perfectly. Probably I need to explain more deeply my problem. I have paired images (input, target) datasets and my purpose is to train a Network using these datasets so that given an input image, I can applied the weight to produce the output. I have defined the Network as a function as follow:

def Unet():

layer =[
        nn.Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False),
        nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True)
]

layer += [nn.Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True)]


layer += [
    nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
    
    nn.Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
    nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.ReLU(inplace=True),

    nn.Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
    nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.ReLU(inplace=True)
] 





layer += [
    nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
    
    nn.Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
    nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.ReLU(inplace=True),

    nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
    nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.ReLU(inplace=True)
] 




layer += [
    nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
    
    nn.Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
    nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.ReLU(inplace=True),

    nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
    nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.ReLU(inplace=True)
] 


layer += [
    nn.MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False),
    
    nn.Conv2d(512, 1024, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
    nn.BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.ReLU(inplace=True),
    nn.Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
    nn.BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
    nn.ReLU(inplace=True)
] 



layer += [ 
        nn.ConvTranspose2d(1024, 1024, kernel_size=(2, 2), stride=(2, 2)),
        
        nn.Conv2d(1024, 512, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True),

        nn.Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True)
] 




layer += [ 
        nn.ConvTranspose2d(512, 512, kernel_size=(2, 2), stride=(2, 2)),
        
        nn.Conv2d(512, 256, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True),

        nn.Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True)
] 





layer += [ 
        nn.ConvTranspose2d(256, 256, kernel_size=(2, 2), stride=(2, 2)),
        
        nn.Conv2d(256, 128, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True),

        nn.Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True)
] 




layer += [ 
        nn.ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2)),
        
        nn.Conv2d(128, 64, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True),

        nn.Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1),padding=(1, 1), bias=False),
        nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True),
        nn.ReLU(inplace=True)
] 


layer += [nn.Conv2d(64, 3, kernel_size=(1, 1), stride=(1, 1), padding=0)]

net = nn.Sequential(*layer)

return net

For example : input shape [1, 3, 64, 95] and target shape is [1, 3, 64, 95]

When the input pass through the Net the Output a tensor shape of [1, 3, 64, 80]. The loss calculation MSE produced an error since the output and the target have differents shape

My forward is computed as follow:

config.model = Unet()
def forward(imgs, config):
x, y = imgs[0], imgs[1]
device = torch.device(‘cpu’)
x,y=x.to(device), y.to(device)
return config.model(Variable(x)), y
config.forward = forward
run(config)

I know that, you can see my above reply. The problem in your code are functions like this : nn.ConvTranspose2d(128, 128, kernel_size=(2, 2), stride=(2, 2)), you should set it to 128,56 instead of 128,128.(see my code in above reply)

After i remove the torch.cat function, don’t half the upsample channel in my code, i get the same output shape([80,64]), you can see the wrong code:

class UNet(nn.Module):
    def __init__(self, in_channels=3, out_channels=1):
        super().__init__()
        
        # encoder part
        self.enc1 = DoubleConv(in_channels, 64)
        self.enc2 = DoubleConv(64, 128)
        self.enc3 = DoubleConv(128, 256)
        self.enc4 = DoubleConv(256, 512)
        
        # polling layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # bottle neck
        self.bottleneck = DoubleConv(512, 1024)
        
        # up sampling part
        self.up1 = nn.ConvTranspose2d(1024, 1024, kernel_size=2, stride=2)
        self.up2 = nn.ConvTranspose2d(512, 512, kernel_size=2, stride=2)
        self.up3 = nn.ConvTranspose2d(256, 256, kernel_size=2, stride=2)
        self.up4 = nn.ConvTranspose2d(128, 128, kernel_size=2, stride=2)
        
        # decoder part
        self.dec1 = DoubleConv(1024, 512)
        self.dec2 = DoubleConv(512, 256)
        self.dec3 = DoubleConv(256, 128)
        self.dec4 = DoubleConv(128, 64)
        
        # output layer
        self.out = nn.Conv2d(64, out_channels, kernel_size=1)

    def forward(self, x):
        # encoder
        enc1 = self.enc1(x)
        enc2 = self.enc2(self.pool(enc1))
        enc3 = self.enc3(self.pool(enc2))
        enc4 = self.enc4(self.pool(enc3))
        
        # bottleneck
        bottleneck = self.bottleneck(self.pool(enc4))
        
        # decoder
        dec1 = self.dec1(self.up1(bottleneck))
        dec2 = self.dec2(self.up2(dec1))
        dec3 = self.dec3(self.up3(dec2))
        dec4 = self.dec4(self.up4(dec3))
        
        return self.out(dec4)

You can compare the different between this code and the architecture about U-Net.

Also, you can see correct code with concatenate action.

I use exactly the above code but I am getting this error now

File “xxxx/model/Unet.py”, line 61, in forward
dec1 = self.dec1(torch.cat([self.up1(bottleneck), enc4], dim=1))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 10 but got size 11 for tensor number 1 in the list

Yes, because you put a tensor with odd height or width. This code of U-Net just support the size is a multiple of 16 of the width or height(96 and 64 is suitable). If you want other width, just using dynamic padding.

Yes you are right because I can produce the same error if I replace the heigh 96 → 95. Could you please help me on how can use dynamic padding ?

Sorry, i had an exam just now. You can add a function in unet for dynamic padding, i will give the total code

import torch
import torch.nn as nn
import torch.nn.functional as F


class DoubleConv(nn.Module):
    def __init__(self, in_channels, out_channels):
        super().__init__()
        self.conv = nn.Sequential(
            nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )

    def forward(self, x):
        return self.conv(x)


def dynamic_padding(x1, x2):
    # calculate the difference in shape
    diff_h = x2.size()[2] - x1.size()[2]
    diff_w = x2.size()[3] - x1.size()[3]
    
    # calculate the padding
    pad_h1 = diff_h // 2
    pad_h2 = diff_h - pad_h1
    pad_w1 = diff_w // 2
    pad_w2 = diff_w - pad_w1
    
    # apply padding
    x1 = F.pad(x1, [pad_w1, pad_w2, pad_h1, pad_h2])
    
    return x1


class UNet(nn.Module):
    def __init__(self, in_channels=3, out_channels=1):
        super().__init__()
        
        # encoder part
        self.enc1 = DoubleConv(in_channels, 64)
        self.enc2 = DoubleConv(64, 128)
        self.enc3 = DoubleConv(128, 256)
        self.enc4 = DoubleConv(256, 512)
        
        # polling layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        
        # bottle neck
        self.bottleneck = DoubleConv(512, 1024)
        
        # up sampling part
        self.up1 = nn.ConvTranspose2d(1024, 512, kernel_size=2, stride=2)
        self.up2 = nn.ConvTranspose2d(512, 256, kernel_size=2, stride=2)
        self.up3 = nn.ConvTranspose2d(256, 128, kernel_size=2, stride=2)
        self.up4 = nn.ConvTranspose2d(128, 64, kernel_size=2, stride=2)
        
        # decoder part
        self.dec1 = DoubleConv(1024, 512)
        self.dec2 = DoubleConv(512, 256)
        self.dec3 = DoubleConv(256, 128)
        self.dec4 = DoubleConv(128, 64)
        
        # output layer
        self.out = nn.Conv2d(64, out_channels, kernel_size=1)
    @staticmethod
    def dynamic_upsample(x1, x2):
        upsample = dynamic_padding(x1, x2)
        return torch.cat([upsample,x2], dim=1)
    def forward(self, x):
        # encoder
        enc1 = self.enc1(x)
        enc2 = self.enc2(self.pool(enc1))
        enc3 = self.enc3(self.pool(enc2))
        enc4 = self.enc4(self.pool(enc3))
        
        # bottleneck
        bottleneck = self.bottleneck(self.pool(enc4))
        
        # decoder
        dec1 = self.dec1(self.dynamic_upsample(self.up1(bottleneck), enc4))
        dec2 = self.dec2(self.dynamic_upsample(self.up2(dec1), enc3))
        dec3 = self.dec3(self.dynamic_upsample(self.up3(dec2), enc2))
        dec4 = self.dec4(self.dynamic_upsample(self.up4(dec3), enc1))
        
        return self.out(dec4)


if __name__ == "__main__":
    # create a model

    test_size = [(1, 3, 93, 64), (1, 3, 81, 93), (1, 3, 198, 108)]
    model = UNet(in_channels=3, out_channels=1)
    for size in test_size:
        x = torch.randn(size)
        output = model(x)
        print(f"Input shape: {x.shape}")
        print(f"Output shape: {output.shape}")
        print()

I write a easy test for this dynamic padding, you can check it too. Also, you can choose a different padding strategy which is customized for your datasets by replacing the F.pad or setting it with nearest and so on.

Thank you very much. The problem is now solved from your code