Pytorch code with 5 convo layers ! Please help me to understand

five convo layer , relu activation function

class ConvNet(nn.Module):
def init(self, num_classes=110):
super(ConvNet, self).init()

    self.features = torch.nn.Sequential(
        
        nn.Conv2d(3, 224, kernel_size=11, stride=2, padding=2),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2),

        nn.Conv2d(224, 192, kernel_size=5, padding=1),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=3, stride=2),

        nn.Conv2d(192, 384, kernel_size=3, padding=1),
        nn.ReLU(),

        nn.Conv2d(384, 256, kernel_size=3, padding=1),
        nn.ReLU(),

        nn.Conv2d(256, 256, kernel_size=3, padding=1),
        nn.ReLU(),

        nn.MaxPool2d(kernel_size=3, stride=2),
    )
    self.avgpool = nn.AdaptiveAvgPool2d((6, 6))

    self.classifier = torch.nn.Sequential(
        
        nn.Dropout(),
        nn.Linear(256 * 6 * 6, 4096), # output of last conv layer*target output h*w
        nn.ReLU(),
        
        nn.Dropout(),
        nn.Linear(4096, 4096),
        nn.ReLU(),
        nn.Linear(4096, num_classes)
    )

def forward(self, x):
    x = self.features(x)
    x = self.avgpool(x)
    x = torch.flatten(x, 1)
    out = self.classifier(x)
    return out

If I understand in the first convo layer the input channels is 3 , and the output produced by the convolution is 224. But what is the input image ?

Could you explain your questions a bit more and in particular which part of the code is unclear, please?

If I understand in the first convo layer the input channels is 3 , and the output produced by the convolution is 224. But what is the input image ?

Yes, the input tensor is expected to have 3 channels and the output activation will have 224 channels.
The actual input is undefined, since it will be passed to the model’s forward method after initializing the model. This tutorial might be a good starter.

My question is how do I check if all the convo layers dimensionality are correct? For example the second convo layer is nn.Conv2d(224, 192, kernel_size=5, padding=1),. That means that the MaxPool2d(kernel_size=3, stride=2), of the first layer does not change the 224 dimension ?

Yes, pooling layers do not change the channel dimension of input activations, but their spatial size.
Conv layers apply their filters on the activation via a convolution or cross-correlation and are thus also flexible regarding the spatial size of the input as long as their kernels are not larger than the input.
Also, 224 is often the spatial size of the input activation while you are using it as the number of output channels, which would work, but don’t confuse these dimensions.
The inputs are defined as [batch_size, channels, height, width].

In order to make a more detailed calculation, if the input image into the first convo layer is an image of 224x224 pixels which will be the output dimension after the first convo layer nn.Conv2d(3, 224, kernel_size=11, stride=2, padding=2) ?

You could use the formula defined in the docs to calculate the spatial output size. @J_Johnson provided a convenient util. method here to also calcualte it:

def calc_conv_size(length, kernel_size, stride=1, padding=0, dilation=1):
    return math.floor((length + 2*padding - dilation*(kernel_size-1)-1)/stride+1)

conv = nn.Conv2d(3, 224, kernel_size=11, stride=2, padding=2) 
x = torch.randn(1, 3, 224, 224)
out = conv(x)
print(out.shape)
# torch.Size([1, 224, 109, 109])

calc_conv_size(length=224, kernel_size=11, stride=2, padding=2)
# 109

As you can see, you can also just run a forward pass and print the output shape.

Thank you. By doing this calculation, at the end of the fifth convo layer it results to me the following dimension :
torch.Size([1, 256, 127, 127])
Do you think it is right?

I attach the following :
#first convo layer
conv = nn.Conv2d(3, 224, kernel_size=11, stride=2, padding=2)
x = torch.randn(1, 3, 224, 224)
out = conv(x)
print(out.shape)

torch.Size([1, 224, 109, 109])

relu = nn.ReLU()
x = torch.randn(1, 224, 109, 109)
out = relu(x)
print(out.shape)
maxpool2d = nn.MaxPool2d(kernel_size=3, stride=2)
x = torch.randn(1, 224, 109, 109)
out = maxpool2d(x)
print(out.shape)

#second convo layer
conv = nn.Conv2d(224, 192, kernel_size=5, padding=1)
x = torch.randn(1, 224, 192, 192)
out = conv(x)
print(out.shape)

relu = nn.ReLU()
x = torch.randn(1, 192, 190, 190)
out = relu(x)
print(out.shape)
maxpool2d = nn.MaxPool2d(kernel_size=3, stride=2)
x = torch.randn(1, 192, 190, 190)
out = maxpool2d(x)
print(out.shape)

#third convo layer
conv = nn.Conv2d(192, 384, kernel_size=3, padding=1)
x = torch.randn(1, 192, 384, 384)
out = conv(x)
print(out.shape)

torch.Size([1, 384, 384, 384])

relu = nn.ReLU()
x = torch.randn(1, 384, 384, 384)
out = relu(x)
print(out.shape)

#fourth convo layer
conv = nn.Conv2d(384, 256, kernel_size=3, padding=1)
x = torch.randn(1, 384, 256, 256)
out = conv(x)
print(out.shape)

torch.Size([1, 256, 256, 256])

relu = nn.ReLU()
x = torch.randn(1, 256, 256, 256)
out = relu(x)
print(out.shape)

#fifth convo layer
conv = nn.Conv2d(256, 256, kernel_size=3, padding=1)
x = torch.randn(1, 256, 256, 256)
out = conv(x)
print(out.shape)

torch.Size([1, 256, 256, 256])

relu = nn.ReLU()
x = torch.randn(1, 256, 256, 256)
out = relu(x)
print(out.shape)
maxpool2d = nn.MaxPool2d(kernel_size=3, stride=2)
x = torch.randn(1, 256, 256, 256)
out = maxpool2d(x)
print(out.shape)

torch.Size([1, 224, 109, 109])
torch.Size([1, 224, 109, 109])
torch.Size([1, 224, 54, 54])
torch.Size([1, 192, 190, 190])
torch.Size([1, 192, 190, 190])
torch.Size([1, 192, 94, 94])
torch.Size([1, 384, 384, 384])
torch.Size([1, 384, 384, 384])
torch.Size([1, 256, 256, 256])
torch.Size([1, 256, 256, 256])
torch.Size([1, 256, 256, 256])
torch.Size([1, 256, 256, 256])
torch.Size([1, 256, 127, 127])

Sorry, but I don’t understand your code and what you are trying to achieve with it.
E.g. here:

maxpool2d = nn.MaxPool2d(kernel_size=3, stride=2)
x = torch.randn(1, 224, 109, 109)
out = maxpool2d(x)
print(out.shape)

#second convo layer
conv = nn.Conv2d(224, 192, kernel_size=5, padding=1)
x = torch.randn(1, 224, 192, 192)
out = conv(x)
print(out.shape)

relu = nn.ReLU()
x = torch.randn(1, 192, 190, 190)
out = relu(x)
print(out.shape)

you are recreating the x tensor with a larger spatial shape than what the output of the pooling layer could be.
What is this code supposed to do and why do you recreate tensors in each layer instead of reusing the output from the previous one?

when i asked the question … In order to make a more detailed calculation, if the input image into the first convo layer is an image of 224x224 pixels which will be the output dimension after the first convo layer nn.Conv2d(3, 224, kernel_size=11, stride=2, padding=2) ?
you answered that the output dimension after the first convo layer was 109x109. Right ?
And you added this code to show it :

conv = nn.Conv2d(3, 224, kernel_size=11, stride=2, padding=2)
x = torch.randn(1, 3, 224, 224)
out = conv(x)
print(out.shape)

torch.Size([1, 224, 109, 109])

In fact the output dimension after the print(out.shape) was 109.
Therefore the next Relu was applied to an input image of 109x109 and this was the reason I wrote the code :
relu = nn.ReLU()
x = torch.randn(1, 224, 109, 109)
out = relu(x)
print(out.shape)

And printed the shape (that was not affected by the Relu, obviously). Then I proceeded to the next maxpool2d , by printing the out.shape after applying the maxpool2d to the input of the previous layer result.
That’s the way I proceeded till the fifth layer to the final output dimension . Is it clear now ?

Yes, but you could use the outputs directly instead since your re-creation of the intermediate tensors is prone to errors. Instead just add the print statements between each layer and keep the original forward method equal, i.e. just reuse the output activations from the previous layer for the next one.

Sorry but i do not understand what do you mean by re-creation. Can you make an example of how to do it ? In your answer of April 8 you suggested the sequence of code:

conv = nn.Conv2d(3, 224, kernel_size=11, stride=2, padding=2)
x = torch.randn(1, 3, 224, 224)
out = conv(x)
print(out.shape)

torch.Size([1, 224, 109, 109])