Linear layer input neurons number calculation after conv2d

ptrblck · November 3, 2018, 1:16pm

Your input shape seems to be a bit wrong, as it looks like the channels are in the last dimension.
In PyTorch, image data is expected to have the shape [batch_size, channel, height, width].
Based on your shape, I guess 36 is the batch_size, while 3 seems to be the number channels.

However, as your model expects 32 input channels, your input won’t work at all currently.

Let’s just assume we are using an input of [1, 32, 200, 150] and walk through the model and the shapes.
Since your nn.Conv2d layers don’t use padding and a default stride of 1, your activation will lose one pixel in both spatial dimensions.
After the first conv layer your activation will be [1, 64, 198, 148], after the second [1, 128, 196, 146].
nnMaxPool2d(2) will halve the activation to [1, 128, 98, 73].

If you set the number of in_features for the first linear layer to 128*98*73 your model will work for my input.

I also recommend to just print out the shape of your activation before the linear layer, if the shape calculation is too cumbersome, and set the input features according to this.
For your Sequential model you can just create a print layer with:

class Print(nn.Module):
    def forward(self, x):
        print(x.size())
        return x