# How to know the input shape of model

I looked into the mnist example, and print the model (`print(model)`) which shows

``````Net(
(conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1))
(conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1))
(dropout1): Dropout(p=0.25, inplace=False)
(dropout2): Dropout(p=0.5, inplace=False)
(fc1): Linear(in_features=9216, out_features=128, bias=True)
(fc2): Linear(in_features=128, out_features=10, bias=True)
)
``````

However, this doesn’t tell us what the input shape is. In mnist, the shape is `(1, 1, 28, 28)`, but how do we know the input shape from the model definition (let’s say we don’t know the model is for mnist)? I couldn’t find any info about it.

Thanks!

Hi Ghost!

This only shows us the `Module`s in `model` and not the `forward()`
function that “glues” the `Module`s together and that may well perform
other non-trivial processing. Let’s analyze this making the simplest

`fc1` has `in_features = 9216`. There must be some sort of `flatten()`
or `reshape` just before this. Convolutions work on images of
arbitrary sizes. The output of `conv2` has `out_channels = 64`, so
the output of `conv2` has shape `[batch_size, 64, H, W]`. Assuming
(correctly) that the batch size just goes along for the ride, after the
`flatten()` operation, we have `64 * H * W` “features” to pass into
`fc1`.

Therefore `H * W` must be `9216 / 64 = 144`. Guessing that the
image is square, it would have shape `[64, 12, 12]`. (It doesn’t
have to be square; it could have a shape of, say, `[64, 9, 16]`,)

Each `Conv2d` layer (no padding, `kernel_size = 3`, `stride = 1`)
trims two rows of pixels off of the image. Therefore the input to
`conv1` must have shape `[batch_size, 1, 16, 16]`.

In general, there will be multiple places in a model where the shape
of the tensor is constrained. (In this case, the input to `fc1` has to
have `in_features = 9216`.) Then you work backwards from the
constraint see what input shapes would be valid for your model.

Based on our assumptions, `[1, 1, 28, 28]` wouldn’t be valid for this
model. After the first two `Conv2d` layers, that shape would become
`[1, 64, 24, 24]`. You could hypothetically have a factor-of-two
downsampling layer between `conv2` and `fc1` to take you down to
`[1, 64, 12, 12]`, but you would typically perform downsampling
between convolutional layers, rather than after them.

(You could also have an AdaptiveAvgPool2d (12, 12) between
`conv2` and `fc1`, which would be a pretty common way to make the
model much more flexible with respect to input shape.)

Best.

K. Frank

Thanks for the detailed explanation!