Trying to understand CNN input shapes

Hi everyone, long time TF user here hoping to switch over to PyTorch. I’m trying to mimic a CNN I wrote with Keras and am running into quite a bit of trouble.

First off, I am trying to feed in pre-made numpy arrays (ran into a host of problems trying to make my own dataset class with the arrays as is, so I figured it would be better to just feed them in more directly) using:

my_dataset = TensorDataset(x_train, y_train)
training_loader = DataLoader(my_dataset)

The numpy arrays are of the shape (# of samples, # of channels, height, width) - does simply feeding in batch_size=256 as an argument to DataLoader take care of the whole batch size problem for me? I have been unsure about this.

I am unsure how to calculate the input shapes for each layer in my CNN - I’ve read some threads on here about it but they all solve the problem directly and don’t actually explain HOW to do this. I am getting a size mismatch runtime error at my first fully connected layer, which in my case appears after a max pooling layer. How can I go about calculating this?

For reference I will include the error message and my code:

RuntimeError: size mismatch, m1: [1 x 2302720], m2: [768 x 120] at /pytorch/aten/src/THC/generic/THCTensorMathBlas.cu:290
class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 2, 1)
        self.conv2 = nn.Conv2d(32, 32, 2, 1)
        
        self.conv3 = nn.Conv2d(32, 64, 3, 1)
        self.conv4 = nn.Conv2d(64, 64, 3, 1)
        
        self.fc1 = nn.Linear(896, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 1)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = F.max_pool2d(x, 2, 2)
        x = torch.flatten(x)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.sigmoid(self.fc3(x))        
        return x

Thank you for any help!!

You use torch.flatten(x) in your code, it reshape x without considering number of batches that you enter. To consider it in your calculation you can

Replace x = Torch.flatten(x) with x = x.reshape(x.shape[0], -1)

this will guarantee that your network takes into account the batch size before feeding input into Linear layer.

Thanks for your reply. I am still getting a size mismatch error though - I’m assuming this is due to improperly calculating the input dimensions for my layers? Do you have any tips for how to do this?

For reference, the error message is: size mismatch, m1: [1028 x 2240], m2: [896 x 120]

And I am starting with a single channel 40x20 pixel image, trying to use the architecture described in my original post.

Hi, considering you have replaced the code as @Abdulrahman mentioned.
Now, in-order for the flatten()'s output to pass through the fc1(), you need to have a Tensor of size->(num_batches, 896).

Since you are using Fully Connected layers, the Network will produce output for only one specific input size.
You must find what’s that particular input size.!!
Can you check what’s the output shape after flattening ?

Thanks for your help. How can I check that? That’s my main question really.

import torch
import torch.nn as nn
import torch.nn.functional as F

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(1, 32, 2, 1)
        self.conv2 = nn.Conv2d(32, 32, 2, 1)
        
        self.conv3 = nn.Conv2d(32, 64, 3, 1)
        self.conv4 = nn.Conv2d(64, 64, 3, 1)
        
        self.fc1 = nn.Linear(896, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 1)
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv3(x))
        x = F.relu(self.conv4(x))
        x = F.max_pool2d(x, 2, 2)
        # x = torch.flatten(x)
        x = x.reshape(x.shape[0], -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.sigmoid(self.fc3(x))        
        return x


batch  = 256
channels = 1

data = torch.randn((batch,channels,40,20))

model = MyModel()

pred = model(data)

print(pred.shape)

Hi, I have reproduced your code with the same parameters which you’ve shared, everything works fine.

In def Forward function you can add print(x.shape) in any place you want to check the size in. For example if you want to see output after x=x.reshape(x.shape[0], -1), you can add print(x.shape) in the next line (before x = F.relu(self.fc1(x)))