Size mismatch, m1: [128 x 512], m2: [25088 x 4096]

How large is each image you pass to the model?
torchvision.models.vgg uses an adaptive pooling layer which should take variable sized inputs. However, if you are using an older version, your model might be missing this layer.