Hi all, I’m new to PyTorch and currently training a SegFormer model on the Cityscapes dataset.
While experimenting with a custom augmentation pipeline I found on GitHub, I noticed something curious: using RandomResizedCrop
for training data and Resize
for validation data leads to inconsistent input/mask shapes.
- My original images are of shape (1024, 2048).
- For training, I’m using
RandomResizedCrop(size=(512, 512), scale=(0.5, 2.0))
. - For validation, I use
Resize(size=(512, 512))
.
After augmentation:
- Training images/masks are of shape (512, 512).
- Validation images/masks become (512, 1024) — presumably due to the aspect ratio being preserved during resizing.
I’ve read that maintaining aspect ratio is important for semantic segmentation, but I’m now wondering:
How important is it to also maintain consistent input sizes between training and validation? Will this difference in shape negatively affect evaluation or model performance?
Thanks in advance!