Why is the shape of CNN output different during training vs inference?

Flock1 · December 3, 2022, 1:40am

Hi,

I am working with Cat and Dog images and I have defined my model like this:

class CatAndDogConvNet(nn.Module):

    def __init__(self):
        super().__init__()

        # onvolutional layers (3,16,32)
        self.conv1 = nn.Conv2d(in_channels = 3, out_channels = 16, kernel_size=(5, 5), stride=2, padding=1)
        self.maxpool = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(in_channels = 16, out_channels = 32, kernel_size=(5, 5), stride=2, padding=1)
        self.maxpool = nn.MaxPool2d(2)
        self.conv3 = nn.Conv2d(in_channels = 32, out_channels = 64, kernel_size=(3, 3), padding=1)
        self.maxpool = nn.MaxPool2d(2)

        # conected layers
        self.fc1 = nn.Linear(in_features= 64 * 6 * 6, out_features=500)
        self.fc2 = nn.Linear(in_features=500, out_features=50)
        self.fc3 = nn.Linear(in_features=50, out_features=2)

        self.maxpool = nn.MaxPool2d(2)

    def forward(self, X):

        X = self.maxpool(F.relu(self.conv1(X)))
        # print(X.shape)
        X = self.maxpool(F.relu(self.conv2(X)))
        # print(X.shape)
        X = self.maxpool(F.relu(self.conv3(X)))
        # print(X.shape)
        X = X.view(X.shape[0], -1)
        # print(X.shape)
        X = F.relu(self.fc1(X))
        X = F.relu(self.fc2(X))
        X = self.fc3(X)

        return X

During training, the output after the maxpool is:

torch.Size([100, 16, 55, 55])
torch.Size([100, 32, 13, 13])
torch.Size([100, 64, 6, 6])
torch.Size([100, 2304])

However, during inference, the output shapes are:

[1, 16, 111, 111]
[1, 16, 55, 55]
[1, 32, 27, 27] 
[1, 64, 27, 27]

I don’t understand what’s going on here

carbocation · December 3, 2022, 1:41am

What is the size of the input during training and during inference? (Could you put another print statement before your first conv1 in the forward statement to check?)

Flock1 · December 3, 2022, 1:43am

The input size is the same during training and inference. [3x224x224]

Flock1 · December 3, 2022, 1:44am

If this helps. I loaded the model and output the definition and got this:

model = CatAndDogConvNet()
model.load_state_dict(torch.load('ML_project.pt'))

this is the definition:

CatAndDogConvNet(
  (conv1): Conv2d(3, 16, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (maxpool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (conv3): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (fc1): Linear(in_features=2304, out_features=500, bias=True)
  (fc2): Linear(in_features=500, out_features=50, bias=True)
  (fc3): Linear(in_features=50, out_features=2, bias=True)
)

carbocation · December 3, 2022, 1:46am

I would expect a batch dimension like [Batchsize, 3, 224, 224]. I think it’s worth actually doing the print statement at the very start of the forward pass like you did after the other layers.

Flock1 · December 3, 2022, 1:48am

Yeah. What I don’t understand is why maxpool is not showing in the definition of the model?

carbocation · December 3, 2022, 1:48am

It is showing up in the first place where you defined it (right after conv1). You used the same name for the other maxpool definitions, so they are all duplicate.

Flock1 · December 3, 2022, 1:50am

So defining different names is needed? Why?

carbocation · December 3, 2022, 1:52am

a = 1 + 1
a = 5
print(a)
#5

They are variables, so if you assign a new value to a variable, it overwrites the original value.

carbocation · December 3, 2022, 1:53am

Additionally, maxpool has no learnable parameters, so I don’t see any problem with just defining it once and using it several times in your forward pass. But you can delete the duplicate definitions since they all assign to the same variable.

Flock1 · December 3, 2022, 1:53am

But they are functions, not variables. So the same name function can be called multiple times right?

carbocation · December 3, 2022, 1:54am

The variable holds a value. That value is a function.

Flock1 · December 3, 2022, 1:58am

And if you can help me with this. I want to get the output of the first linear layer and defined a new model like this:

model_new = torch.nn.Sequential(*list(model.children())[:7])
model_new

Sequential(
  (0): Conv2d(3, 16, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (1): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (2): Conv2d(16, 32, kernel_size=(5, 5), stride=(2, 2), padding=(1, 1))
  (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (4): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (6): Linear(in_features=2304, out_features=500, bias=True)
)

But when I pass my images through it,

res_4 = []
model_new = torch.nn.Sequential(*list(model.children())[:7])
for i in range(len(imgs)):
    temp = model_new(imgs[i][0])
    res_4.append([temp, imgs[i][1]])

I get the following error:

RuntimeError Traceback (most recent call last)
/tmp/ipykernel_95235/516937314.py in
2 model_new = torch.nn.Sequential(*list(model.children())[:7])
3 for i in range(len(imgs)):
----> 4 temp = model_new(imgs[i][0])
5 res_4.append([temp, imgs[i][1]])

~/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
→ 1190 return forward_call(*input, **kwargs)
1191 # Do not call functions when jit is used
1192 full_backward_hooks, non_full_backward_hooks = ,

~/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/container.py in forward(self, input)
202 def forward(self, input):
203 for module in self:
→ 204 input = module(input)
205 return input
206

~/anaconda3/envs/torch/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
1189 or _global_forward_hooks or _global_forward_pre_hooks):
…
→ 114 return F.linear(input, self.weight, self.bias)
115
116 def extra_repr(self) → str:

RuntimeError: mat1 and mat2 shapes cannot be multiplied (384x6 and 2304x500)

This is what I am getting for the model when I pass the image

summary(model_new, (1, 3, 224, 224))

==========================================================================================
Layer (type:depth-idx)                   Output Shape              Param #
==========================================================================================
Sequential                               --                        --
├─Conv2d: 1-1                            [1, 16, 111, 111]         1,216
├─MaxPool2d: 1-2                         [1, 16, 55, 55]           --
├─Conv2d: 1-3                            [1, 32, 27, 27]           12,832
├─MaxPool2d: 1-4                         [1, 32, 13, 13]           --
├─Conv2d: 1-5                            [1, 64, 13, 13]           18,496
├─MaxPool2d: 1-6                         [1, 64, 6, 6]             --
==========================================================================================
Total params: 32,544
Trainable params: 32,544
Non-trainable params: 0
Total mult-adds (M): 27.46
==========================================================================================
Input size (MB): 0.60
Forward/backward pass size (MB): 1.85
Params size (MB): 0.13
Estimated Total Size (MB): 2.58

carbocation · December 3, 2022, 2:02am

Without doing the math, my guess would be that some of your images are not the size you think they are. Hence my suggestion earlier in this thread to actually print the image size in real time, so when it fails you can know what the image size was that triggered the failure.

Just before you pass your input batch into your model with model(imgs[i][0]), what is the shape of imgs[i][0] in the iteration where it crashes?

Flock1 · December 3, 2022, 2:04am

But they are the images from Kaggle. So you think that some images could be of different shape? Because it’s working till such time I don’t get to the linear layer. Till CNN layer, it works

carbocation · December 3, 2022, 2:05am

Sorry, I made this edit right when you posted your follow-up. This information will help you understand the error (assuming that your model sometimes runs and sometimes crashes).

Flock1 · December 3, 2022, 2:07am

This is the shape:
torch.Size([1, 3, 224, 224])
I don’t know how t print the shape of each layer output during inference. But I ran the model till the last max pool and I got this output:
torch.Size([1, 64, 6, 6])