Size mismatch, m1: [16 x 4096], m2: [1024 x 3] at /pytorch/aten/src/TH/generic/THTensorMath.cpp:41

Without seeing the model architecture I guess that you were flattening the activations at one point and did not use an adaptive pooling layer, which would relax the shape condition.
The original shape mismatch is a 4x increase in the activation shape, so that I’m wondering why changing the input from 256 to 224 would solve this issue.