Without seeing the model architecture I guess that you were flattening the activations at one point and did not use an adaptive pooling layer, which would relax the shape condition.
The original shape mismatch is a 4x increase in the activation shape, so that I’m wondering why changing the input from 256 to 224 would solve this issue.