Resize vs accuracy

Stuck with something that doesn’t fit common sense.
I have set of spectrograms with shape 64x192 (1:3 ratio).

Option 1: first I do upscale (cv2 bilinear) to 224x224 and feed images to resnet50, as a result getting 90%+ top1 accuracy.

Option 2: keep 64x192, and replace head of resnet50 (conv2d(kernel=7), maxpool) with strided convolution, stride=3. New head’s output, before bottleneck layers is 64x64x64 (CxWxH), which is similar to resnet (64x56x56). But, whatever kernel size I use, 3/5/7, new models’ ability to generalize is worse, top1 accuracy is getting stuck at about 80%+.

It doesn’t really make sense, since bilinear upscaling doesn’t introduce new information, yet it should make data even worse due to bilinear algorithm. The only reason I consider somehow possible is that bilinear upscaling smoothing values of pixels, thus prevents overfitting, but still…
Any ideas or something from your previous experience?