Why is the shape of CNN output different during training vs inference?

Hi,

I am working with Cat and Dog images and I have defined my model like this:

class CatAndDogConvNet(nn.Module):

    def __init__(self):
        super().__init__()

        # onvolutional layers (3,16,32)
        self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 16, kernel_size=(5, 5), stride=2, padding=1)
        self.maxpool = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(in_channels = 16, out_channels = 32, kernel_size=(5, 5), stride=2, padding=1)
        self.maxpool = nn.MaxPool2d(2)
        self.conv3 = nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size=(3, 3), padding=1)
        self.maxpool = nn.MaxPool2d(2)

        # conected layers
        self.fc1 = nn.Linear(in_features= 64 * 6 * 6, out_features=500)
        self.fc2 = nn.Linear(in_features=500, out_features=50)
        self.fc3 = nn.Linear(in_features=50, out_features=2)

        self.maxpool = nn.MaxPool2d(2)

    def forward(self, X):

        X = self.maxpool(F.relu(self.conv1(X)))
        # print(X.shape)
        X = self.maxpool(F.relu(self.conv2(X)))
        # print(X.shape)
        X = self.maxpool(F.relu(self.conv3(X)))
        # print(X.shape)
        X = X.view(X.shape[0], -1)
        # print(X.shape)
        X = F.relu(self.fc1(X))
        X = F.relu(self.fc2(X))
        X = self.fc3(X)

        return X

During training, the output after the maxpool is:

torch.Size([100, 16, 55, 55])
torch.Size([100, 32, 13, 13])
torch.Size([100, 64, 6, 6])
torch.Size([100, 2304])

However, during inference, the output shapes are:

[1, 16, 111, 111]
[1, 16, 55, 55]
[1, 32, 27, 27] 
[1, 64, 27, 27]

I don’t understand what’s going on here

What is the size of the input during training and during inference? (Could you put another print statement before your first conv1 in the forward statement to check?)

The input size is the same during training and inference. [3x224x224]

If this helps. I loaded the model and output the definition and got this:

model = CatAndDogConvNet()
model.load_state_dict(torch.load('ML_project.pt'))

this is the definition:

CatAndDogConvNet(
  (conv1): Conv2d(3, 16, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=2304, out_features=500, bias=True)
  (fc2): Linear(in_features=500, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

I would expect a batch dimension like [Batchsize, 3, 224, 224]. I think it’s worth actually doing the print statement at the very start of the forward pass like you did after the other layers.

Yeah. What I don’t understand is why maxpool is not showing in the definition of the model?

It is showing up in the first place where you defined it (right after conv1). You used the same name for the other maxpool definitions, so they are all duplicate.

So defining different names is needed? Why?

a = 1 + 1
a = 5
print(a)
#5

They are variables, so if you assign a new value to a variable, it overwrites the original value.

Additionally, maxpool has no learnable parameters, so I don’t see any problem with just defining it once and using it several times in your forward pass. But you can delete the duplicate definitions since they all assign to the same variable.

But they are functions, not variables. So the same name function can be called multiple times right?

The variable holds a value. That value is a function.

And if you can help me with this. I want to get the output of the first linear layer and defined a new model like this:

model_new = torch.nn.Sequential(*list(model.children())[:7])
model_new
Sequential(
  (0): Conv2d(3, 16, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (2): Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Linear(in_features=2304, out_features=500, bias=True)
)

But when I pass my images through it,

res_4 = []
model_new = torch.nn.Sequential(*list(model.children())[:7])
for i in range(len(imgs)):
    temp = model_new(imgs[i][0])
    res_4.append([temp, imgs[i][1]])

I get the following error:

RuntimeError Traceback (most recent call last)
/tmp/ipykernel_95235/516937314.py in
2 model_new = torch.nn.Sequential(*list(model.children())[:7])
3 for i in range(len(imgs)):
----> 4 temp = model_new(imgs[i][0])
5 res_4.append([temp, imgs[i][1]])

~/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = ,

~/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
202 def forward(self, input):
203 for module in self:
→ 204 input = module(input)
205 return input
206

~/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):

→ 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) → str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (384x6 and 2304x500)

This is what I am getting for the model when I pass the image

summary(model_new, (1, 3, 224, 224))
==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Sequential                               --                        --
├─Conv2d: 1-1                            [1, 16, 111, 111]         1,216
├─MaxPool2d: 1-2                         [1, 16, 55, 55]           --
├─Conv2d: 1-3                            [1, 32, 27, 27]           12,832
├─MaxPool2d: 1-4                         [1, 32, 13, 13]           --
├─Conv2d: 1-5                            [1, 64, 13, 13]           18,496
├─MaxPool2d: 1-6                         [1, 64, 6, 6]             --
==========================================================================================
Total params: 32,544
Trainable params: 32,544
Non-trainable params: 0
Total mult-adds (M): 27.46
==========================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 1.85
Params size (MB): 0.13
Estimated Total Size (MB): 2.58

Without doing the math, my guess would be that some of your images are not the size you think they are. Hence my suggestion earlier in this thread to actually print the image size in real time, so when it fails you can know what the image size was that triggered the failure.

Just before you pass your input batch into your model with model(imgs[i][0]), what is the shape of imgs[i][0] in the iteration where it crashes?

1 Like

But they are the images from Kaggle. So you think that some images could be of different shape? Because it’s working till such time I don’t get to the linear layer. Till CNN layer, it works

Sorry, I made this edit right when you posted your follow-up. This information will help you understand the error (assuming that your model sometimes runs and sometimes crashes).

This is the shape:
torch.Size([1, 3, 224, 224])
I don’t know how t print the shape of each layer output during inference. But I ran the model till the last max pool and I got this output:
torch.Size([1, 64, 6, 6])