How to create a layer with different-sized filters

lesscomfortable · January 17, 2018, 11:19am

Hi!

I need to create a 2-layer convolutional net that takes as input a 3-224-224 image, uses 50 kernels of 33, 50 kernels of 44 and 50 kernels of 5*5 in each layer to perform convolutions and then returns an image.

Can someone help me with an example of this? I tried the following code with a batch size of 16 but the output is torch.Size([16, 142572]), not the 16-3-224-224 that I was expecting.

 class Net(nn.Module):
     def __init__(self):
         super(PrepNetwork, self).__init__()
         self.layer1P = nn.Sequential(
             nn.Conv2d(3, 50, kernel_size=3, padding=1),
             nn.ReLU(),
             nn.Conv2d(50, 50, kernel_size=4, padding=1),
             nn.ReLU(),
             nn.Conv2d(50, 50, kernel_size=5, padding=1),
             nn.ReLU())
         self.layer2P = nn.Sequential(
             nn.Conv2d(50, 50, kernel_size=3, padding=1),
             nn.ReLU(),
             nn.Conv2d(50, 50, kernel_size=4, padding=1),
             nn.ReLU(),
             nn.Conv2d(50, 3, kernel_size=5, padding=1))
 
     def forward(self, x):
         h1 = self.layer1P(x)
         out = self.layer2P(h1)
         out = out.view(out.size(0), -1)
         return out

Thanks in advance!

ptrblck · January 17, 2018, 11:35am

Remove out = out.view(out.size(0), -1) from your forward method amd you will get an output shape of [batch, 3, 218, 218].

.view works like reshape in numpy. So basically you are flattening your output, which gives [batch, 3*218*218=142572].

If you need the same width and height (224) in your output, change the padding in your conv layers, i.e.:

padding=1 for kernel_size=3
padding=2 for kernel_size=5

Since kernel_size=4 is even, it’s a bit more complicated, because it will change the output size by +/-1 for padding=1/2.
In your case, you could leave padding=1 in layer1P and set it to padding=2 in layer2P for kernel_size=4 and you will get an output with the shape [batch, 3, 224, 224].

lesscomfortable · January 17, 2018, 11:48am

Thank you for the detailed answer! Do you know some resource where I could learn how kernel size and padding affects the dimensions of the final output?

ptrblck · January 17, 2018, 12:00pm

Have a look at the lecture about CNNs in Stanford’s CS231n.
It makes the input and output sizes clear and gives you an intuition on what’s happening in a conv layer.