A network whose first layer is Conv2d
will have an input size of (batchsize, n_channels, height, width).
Since Convolutional layers in PyTorch are dynamic by design, there is no straightforward way to return the intended/expected height and width, and in fact (subject to remaining a valid size after unpadded convolutions and poolings etc), any image size may be acceptable to a module composed completely of convolutions.
If the network subsequently contains an e.g. Linear
layer with a fixed input size parameter, any image with size (height/n, n*width) should be acceptable input to the network (subject to the same above conditions, where height and width were the intended dimensions of the input).