Concatenate two tensors with different sizes

      Dear senior programmers,

I have obtained the following network structure by modifying someone’s else network. I have added the dilation keyword so as to obtain dilated convolutional layer. However, given that there were some concatenation in the “forward part” of the network, I have not been able to adjust the output channels and the concatenation properly. Please could anyone explain to me how to fix? The network is as follows.

class net(nn.Module):   
    def __init__(self):
        super(net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=1)
        self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, padding=2, dilation=2)
        self.conv3 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=5, padding=2, dilation=2)
        self.conv4 = nn.Conv2d(in_channels=6, out_channels=3, kernel_size=7, padding=3, dilation=2)
        self.conv5 = nn.Conv2d(in_channels=12, out_channels=3, kernel_size=3, padding=1)
        self.b = 1

    def forward(self, x): 
        
        x1 = F.relu(self.conv1(x))
        #print(x1.shape)
        x2 = F.relu(self.conv2(x1))
        print(x2.shape)
        cat1 = torch.cat((x1, x2), 2)
        x3 = F.relu(self.conv3(cat1))
        print(x3.shape)
        cat2 = torch.cat((x2, x3), 2)
        x4 = F.relu(self.conv4(cat2))
        cat3 = torch.cat((x1, x2, x3, x4),2)
        k = F.relu(self.conv5(cat3))

        if k.size() != x.size():
            raise Exception("k, haze image are different size!")

        output = k * x - k + self.b
        return F.relu(output)

The running error is as follows.

RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 2. Got 476 and 480 in dimension 3 at /opt/conda/conda-bld/pytorch_1573049304260/work/aten/src/THC/generic/THCTensorMath.cu:71

Please, how can I fix this error? I would also like to know the relationships between input channel, output channel, padding and dilation.

Thank you for your time and patience

Hi @Patrice,
*Rule1: If you want to concatenate the layer, it has to have only 1 dimension that is different from the other (i.e. NxDiff1xHxW and NxDiff2xHxW or NxCxDiff1xW and NxCxDiff2xW, etc)
in your case, what you are trying to do:
suppose our input x: 1x3x28x28

what you are doing is:

  • you do convolution x1, the output is 1x3x28x28
  • you do convolution x2, the output is 1x3x28x28
  • (a) you do concatenation cat1 of x1 and x2 at axis 2, the output is 1x56x28x3 (incorrect)
  • you do convolution x3, the output is 1x3x52x24
  • (b) you do concatenation cat2 of x2 and x3 at axis 2 again, with this x2 and x3 dimensions, which are 1x56x28x3 (x2) and 1x3x52x24 (x3). These two will violate the *Rule1.

Hence, you need to fix these ones (a and b) to make sure that they conform to *Rule1.
Here is the correct model with dilated convolution that I corrected for you :).

import torch
from torch import nn
from torch.nn import functional as F

class net(nn.Module):   
    def __init__(self):
        super(net, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=1)
        self.conv2 = nn.Conv2d(in_channels=3, out_channels=3, kernel_size=3, padding=2, dilation=2)
        self.conv3 = nn.Conv2d(in_channels=6, out_channels=3, kernel_size=5, padding=4, dilation=2)
        self.conv4 = nn.Conv2d(in_channels=6, out_channels=3, kernel_size=7, padding=6, dilation=2)
        self.conv5 = nn.Conv2d(in_channels=12, out_channels=3, kernel_size=3, padding=1)
        self.b = 1

    def forward(self, x, debug=False): 
        
        x1 = F.relu(self.conv1(x))
        if debug: print('x1:', x1.shape)
        x2 = F.relu(self.conv2(x1))
        if debug: print('x2:', x2.shape)
        cat1 = torch.cat((x1, x2), 1)
        if debug: print('cat1:', cat1.shape)
        x3 = F.relu(self.conv3(cat1))
        if debug: print('x3:', x3.shape)
        cat2 = torch.cat((x2, x3), 1)
        if debug: print('cat2:', cat2.shape)
        x4 = F.relu(self.conv4(cat2))
        if debug: print('x4:', x4.shape)
        cat3 = torch.cat((x1, x2, x3, x4),1)
        if debug: print('cat3:', cat3.shape)
        k = F.relu(self.conv5(cat3))
        if debug: print('k:', k.shape)

        if k.size() != x.size():
            raise Exception("k, haze image are different size!")

        output = k * x - k + self.b
        if debug: print('output:', output.shape)
        return F.relu(output)
    
net_instance = net()
b = torch.rand(1, 3, 28, 28)
with torch.no_grad():
    b = net_instance(b, True)

#OUTPUT:    
#x1: torch.Size([1, 3, 28, 28])
#x2: torch.Size([1, 3, 28, 28])
#cat1: torch.Size([1, 6, 28, 28])
#x3: torch.Size([1, 3, 28, 28])
#cat2: torch.Size([1, 6, 28, 28])
#x4: torch.Size([1, 3, 28, 28])
#cat3: torch.Size([1, 12, 28, 28])
#k: torch.Size([1, 3, 28, 28])
#output: torch.Size([1, 3, 28, 28])

Hope it helps, cheers~

Isn’t the cat of x1 and x2 at axis 2 --> 1x3x56x28 unless there’s some kind of transpose happening which I don’t see?

By the way, welcome to the community :wink:.

Dear Mr. Brilian, I am very grateful for your prompt reply. The code is now running. Please, I would like to get a better picture of convolution, padding and dilation.

after doing convolution x1, the output is 1x3x28x28. I understand this. Then doing conv2 of x1, how did you get 1x3x28x28? Should not it be 1x3x26x26 given that the kernel is 3?

I am sorry if my question might seem silly. I had read about convolution, padding and dilation. However, it seems like I still have not got it clearly.

Yes sir, you are right. cat of x1 and x2 at axis 2 should be 1x3x56x28. Please, could you clarify how the output of conv2 (x1) i.e x2 is obtained?

Please refer to relationship 15 on page 28 of this excellent guide to convolution arithmetic.

Hi @Patrice, some explanation of conv2(x1):

Notes:

Padding: like adding some zero values on the edges of your image, ie. padding 1 in pytorch will change your x1 from 1x3x28x28 to 1x3x30x30 (1 pixel top-down and left-right), etc.

Dilation: is like you are expanding your kernel by 2x2 (like this), ie, you have kernel=3, which is 3x3 kernel. Then you add 2x2 = 5x5 kernel, etc.

To answer your question:

    • you do convolution with 3x3 kernel on 1x3x28x28, the output will be 1x3x26x26 IF the padding is not added (padding=0) and dilation is 1.
    • you do convolution with 3x3 kernel on 1x3x28x28, the output will be 1x3x24x24 IF the padding is not added (padding=0) and dilation is 2.
    • you do convolution with 3x3 kernel on 1x3x28x28, the output will be 1x3x26x26 IF the padding is added (padding=1) and dilation is 2.
    • you do convolution with 3x3 kernel on 1x3x28x28, the output will be 1x3x28x28 IF the padding is added (padding=2) and dilation is 2.

Thanks for the welcome @harsha_g
Hope it helps, cheers ~

Thank you very much Mr. Brilian. It is much clearer now.

Thank you sir. I have gone through it and it has been very helpful.

good point, it is a typo, thanks for the correction :slight_smile:

Cool! And happy coding :slight_smile: