Why do we need to specify the input of a linear layer after flatten?

Hello, I don’t mean to be polemic, I am just curious. I was wondering why, in PyTorch, we need to specify the input size of a linear layer. In keras, after I flatten a layer, I can feed this to a linear layer without having to specify the input size (I assume this can be computed by flatten()). This is very convenient when we want to feed an image of different size to a VGG network and we don’t want to do any model surgery.
I know that in PyTorch we have the AdaptiveGlobalAveragePooling exactly for this reason. However, this means that feeding a big image in a VGG in Keras or Pytorch would actually use a different model. In keras, one without AdaptiveGAP, in PyTorch with it. If we wanna mimic the keras model, we need to change the VGG layer in PyTorch. If the linear layer could automatically compute the input, we wouldn’t need that.


The main reason is that pytorch needs to create the weights at the initialization of the layer (and thus know the input size).
This is to make sure that calls like model.cuda() or your_opt = optim.SGD(model.parameters(), ...) works as expected even before your model has seen a single sample.

Note that changing this is under discussion (but tricky to make sure it works fine with the other constructs in pytorch) and you can find the details in this issue: https://github.com/pytorch/pytorch/issues/23352

1 Like

thanks, exactly the explanation I was looking for