PyTorch > nn.Conv2d:computation of number of features output from nn.Conv2d

mohassan99 · May 4, 2022, 8:10am

I have x = nn.linear() following x=conv2d()

I understand in_features of linear() must be calculated from x. by calculating channels * height * width of x.

I know from, the documentation for conv2d, this stuff I dont get:

I dont follow this post on nn.Conv2d output comptation), but it says you can get dimensions of x with x.shape.

I still dont understand how to apply those equations from the documentation of nn.conv.

I know:

The input features are the pixel size (i.e.: h*w) of the image * channels and
pixel size (H*W) of your image after applying Conv2d comes from the above equations, but I dont understand it. Are we just defining notation in the input: and output: parts?

Then it gives formulas for H_out and W_out, the height and width of the image output from conv2d based on H_in and W_in, the height and width of image input to conv2D, respectively. I don’t understand how to apply those equations.

how do I get H_in and W_in? Would those be x[2] and x[3] based on the assumptions that what comes after input: and ouput: are the dimensions of x. I.e.: N and the number of channels are the first two dimensions.

Assuming:

we are using x as it is after the line just before linear() and
input and output above show conv2D’s input and output orders of dimensions and the notation for the equations.

Are padding, dilation, and kernel_size taken from the parameters values of the preceding conv2D call? If the preceding call is:
self.conv2 = nn.Conv2d(6, 16, 5)
This means: self.conv2 = nn.Conv2d(in_channels = 6, out_channels = 16, kernel_size = 5) kernel size is 5, what is kernel_size[0] and where to I get padding and dilation?

Then, say we have a call to nn.Linear afterward. It’s in_features parameter would be C * H_{out} * W_{out}, where:
C = the out_channels parameter used in the prior conv2D call?

The current sad state of the code:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        # Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
        self.conv1 = nn.Conv2d(in_channels = 1, out_channels = 6, kernel_size = 5)
        
        self.conv2 = nn.Conv2d(in_channels = 6, out_channels = 16, kernel_size = 5)
       
        self.pool = nn.MaxPool2d(2, 2)
# To decide the number of input features your function can take, you first need to figure out what is the output pixel value of your final conv2d. 
# Let's say your final conv2d function returns a 3x3 pixel image with output channels as 50, in such case the input features to your linear function should be 3x3x50 
# and you can have any number of output features.
        
        #calculate H & W of self.conv2 output from 
        #I GOT STUCK HERE. I WENT TO TUESDAY OFFICE HOURS 5 MIN LATE AND NO ONE WAS THERE
        #I DIDN'T GET AN ANSWER TO MY POST #1032 ABOUT POST #815
#         print(self.conv2.shape)
#       It seems self.conv2 has no shape attribute.
#         H_in = 
#         H = 
#         W =
        #Linear(in_features, out_features, bias=True, device=None, dtype=None)
        #self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc1 = nn.Linear(16 * H * W, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x)) #relu is an activation function
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
net = Net()

ptrblck · May 4, 2022, 8:56am

Yes, the spatial dimensions (height and width) defined in dim2 and dim3, respectively for the expected 4-dimensional input in the shape [batch_size, channels, height, width].

Yes

If you specify the kernel_size with a single int, the height and width of the kernel will be set to this value. kernel_size[0] and also kernel_size[1] will have a value of 5. The docs also explain this behavior.

If you don’t specify these values explicitly, the default values will be used also as given in the docs.

Yes, that’s correct.

mohassan99 · May 4, 2022, 9:09am

Thank you!

In
def __init__(self): I have …

        self.conv2 = nn.Conv2d(in_channels = 6, out_channels = 16, kernel_size = 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)

This last line above is nn.Linear(16 * h * w, 120) and I need h & w?
, but self.conv2.shape doesn’t exist so self.conv2[2] doesn’t give me the height of an image.
If I add
print(self.conv2.shape) I get:
'Conv2d' object has no attribute 'shape'

Matias_Vasquez · May 4, 2022, 9:27am

For your parameters (since you are not giving values for padding, stride and dilation the default are used) the equation would look like this

If you want to use this approach instead of solving the equation

Then you need to pass something through the conv to see what is happening.

# Example
conv2 = torch.nn.Conv2d(in_channels = 6, out_channels = 16, kernel_size = 5)
img = torch.rand(1, 6, 50, 50)

print(conv2(img).shape)

# Output:
# torch.Size([1, 16, 46, 46])

Which is the same result that we got from the formula.

ptrblck · May 4, 2022, 9:38am

As @Matias_Vasquez described you could either calculate the activation shape manually or pass an input to the model to print the actual shape.
E.g. in your model you would most likely have a few conv, pool, etc. layers before the first linear layer is used.
If you don’t want to calculate the in_features, set this argument to any random value (e.g. 1), pass the input tensor to the model, and print the activation shape before passing it to the linear layer:

x = self.conv(x)
x = x.view(x.size(0), -1)
print(x.shape)
x = self.fc(x) # will raise a shape mismatch here

Then use the printed shape of dim1 to set the in_features.
Alternatively, you could also use the lazy layers e.g. via nn.LazyLinear to let this layer calculate the in_features for you.

mohassan99 · May 4, 2022, 9:39am

I was trying to print the shape of conv2 object to see how I can get H_in.

Are you saying I should literally copy/paste that somewhere in my existing code? In the construction of the net class? I’m not sure how or where to incorporate this into my existing code.

ptrblck · May 4, 2022, 9:43am

This isn’t possible as the conv layer can work on a variable input shape as long as the spatial shape of the input is large enough for the current conv setup.
The output shape thus depends on the input shape and the conv setup. The conv layer alone will not define the output shape.

mohassan99 · May 4, 2022, 9:53am

Sorry guys I’m so lost. I dont really understand this stuff and was coasting despite that by asking questions and getting the coding. Nothing about NN made sense to me.

In the code in my question, this line gives me a matrix multiplication error:
self.fc1 = nn.Linear(16 * 5 * 5, 120).
I’m told the first parameter is wrong, but can be calculated.
The calculation include h_in, which I dont know how to get.
What’s the easy way to get an in_features that will run?

mohassan99 · May 4, 2022, 9:57am

Like this?
self.fc1 = nn.Linear(nn.LazyLinear, 120)

module 'torch.nn' has no attribute 'LazyLinear'

ptrblck · May 4, 2022, 10:00am

See my previous post with a code snippet which shows how to print the activation shape.

No, nn.LazyLinear is a layer, not an argument of `nn.Linear.

self.fc1 = nn.LazyLinear(120)

should work.

mohassan99 · May 4, 2022, 10:04am

self.fc1 = nn.LazyLinear(120)
module 'torch.nn' has no attribute 'LazyLinear'

ptrblck · May 4, 2022, 10:09am

You might need to update your PyTorch version to be able to use this layer, as your version seems to be too old.

mohassan99 · May 4, 2022, 10:23am

This code is all going inside

class Net(nn.Module):
with a

def __init__(self):
        super().__init__()

part and a
def forward(self, x):
part.

I had this in the constructor(?), def __init__(self): part:

self.conv2 = nn.Conv2d(in_channels = 6, out_channels = 16, kernel_size = 5)
self.pool = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(16 * 5 * 5, 120)

You are suggesting I modify

x = self.conv(x)
x = x.view(x.size(0), -1)
print(x.shape)
x = self.fc(x) # will raise a shape mismatch here

to work in this context.

It almost looks like your syntax is different. How does the code in the model/class definition correspond to your code? The class has

self.someName = someFunction(someParameters)
self.someName1 = someFunction1(someParameters1)

Your code looks like
x = self.someName(x)
x = self.someName1(x)

I dont know how to translate between them.

I’m just trying to adapt the code in
https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html#define-a-convolutional-neural-network
to work with MNIST input which is just one channel greyscale instead of 3 like CIFAR10.

ptrblck · May 4, 2022, 11:02am

My code should be used in the forward method, not the __init__.
As you can see in the tutorial, the __ini__ initializes the modules, while the forward uses them.

mohassan99 · May 4, 2022, 6:21pm

I dont know how or if that’s possible or allowed:
I have these imports:

%matplotlib inline

import torch
import torchvision
import torchvision.transforms as transforms

import matplotlib.pyplot as plt
import numpy as np

import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import random
import numpy as np

import os

I have the constructor:

    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

I’m getting a matrix multiplication error because
self.fc1 = nn.Linear(16 * 5 * 5, 120) has the wrong number of input features.
How do I find the number of input features using these H_out, H_in equations or otherwise work around the error? Dont I need to replace “16 * 5 * 5” with something?

The full model definitions is:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x)) #relu is an activation function
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
net = Net()

In this context, could I and how would I apply your suggestion to:

ptrblck:

pass an input to the model to print the actual shape.
E.g. in your model you would most likely have a few conv, pool, etc. layers before the first linear layer is used.
If you don’t want to calculate the in_features, set this argument to any random value (e.g. 1), pass the input tensor to the model, and print the activation shape before passing it to the linear layer:
x = self.conv(x)
x = x.view(x.size(0), -1)
print(x.shape)
x = self.fc(x) # will raise a shape mismatch here
Then use the printed shape of dim1 to set the in_features.
Alternatively, you could also use the lazy layers e.g. via nn.LazyLinear to let this layer calculate the in_features for you.

mohassan99 · May 4, 2022, 6:40pm

I dont see how you get the result from the equations. You are still missing H_in. How do I get that when I have:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        #self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        x = F.relu(self.fc1(x)) #relu is an activation function
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x
    
net = Net()

As you know, I am seeing a shape mismatch error in the first linear layer, a matrix multiplication error of (a x b) @ (c x d), but b != c.

I don’t see where or how to get height_in and width_in for the functions to calculate the in_features manually.

My environment or something does not recognize LazyLinears when I replace
self.fc1 = nn.Linear(16 * 5 * 5, 120)
with
self.fc1 = nn.LazyLinears(120).

Matias_Vasquez · May 4, 2022, 7:41pm

You can print the shape of x right before you feed it to your first linear layer.

print(x.shape)

Matias_Vasquez · May 5, 2022, 6:19am

As @ptrblck said, the forward method uses your layers when you pass something through your network.

For debugging purposes you can add a print statement in this forward method so that you get an idea of what is happening before your get your error.

mohassan99:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        #self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv1 = nn.Conv2d(1, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        print("Input shape = ", x.shape)
        x = self.pool(F.relu(self.conv1(x)))
        print("Shape after conv1 = ", x.shape)
        x = self.pool(F.relu(self.conv2(x)))
        print("Shape after conv2 = ", x.shape)
        x = torch.flatten(x, 1) # flatten all dimensions except batch
        print("Shape after flatten = ", x.shape)
        x = F.relu(self.fc1(x)) #relu is an activation function
        print("Shape after fc1 = ", x.shape)
        x = F.relu(self.fc2(x))
        print("This is not even my final shape = ", x.shape)
        x = self.fc3(x)
        print("Final shape = ", x.shape)
        return x
    
net = Net()

So the H_in would refer to the height of the input shape for each layer. This means that H_out from conv1 would be the same as H_in for conv2.

Also, the shapes are in the format BxCxHxW = Batch x Channel x Height x Width.