I checked your code. It seems that the shape of the tensor after your average pooling layer is torch.Size([2, 2048, 1, 1])
for images with the size lower than 57x57 and for images with higher size, the size would be sth like torch.Size([2, 2048, x, x])
which x is bigger than 1. So the number of neurons will be incompatible with the next fully-connected layer which always expects a fixed size tensor. So for bigger input images, it raises an error.
2 Likes