SegFormer Input Size Consistency: RandomResizedCrop vs Resize

Hi all, I’m new to PyTorch and currently training a SegFormer model on the Cityscapes dataset.

While experimenting with a custom augmentation pipeline I found on GitHub, I noticed something curious: using RandomResizedCrop for training data and Resize for validation data leads to inconsistent input/mask shapes.

  • My original images are of shape (1024, 2048).
  • For training, I’m using RandomResizedCrop(size=(512, 512), scale=(0.5, 2.0)).
  • For validation, I use Resize(size=(512, 512)).

After augmentation:

  • Training images/masks are of shape (512, 512).
  • Validation images/masks become (512, 1024) — presumably due to the aspect ratio being preserved during resizing.

I’ve read that maintaining aspect ratio is important for semantic segmentation, but I’m now wondering:
How important is it to also maintain consistent input sizes between training and validation? Will this difference in shape negatively affect evaluation or model performance?

Thanks in advance!

I couldn`t reproduce this results on colab.
But for semantic segmentation is very important that you keep the aspect ratio, so the model can get the patterns and learn the features of the dataset. The model learns global and local features, so if you get different features it may perform poorly. The best experiment is to try both ways while keep tracking on the metrics, and after that you will have the real answer, the results will depend on what your data looks like too.

Thanks a lot, I got the idea, the important is more on keeping aspect ratio than keeping same input shape at the end.

1 Like