I am working on a fully convolutional autoencoder, which I train on 256x256 patches of my 1024x1024 images using torchvision RandomCrop data augmentation method on the fly.
I see very different results when using my model to predict one 256x256 patch, versus the whole 1024x1024 source image. From my understanding the local weights of the convolutional kernels should learn to recognize the same patterns in both cases as the scale between the patches and full image is conserved.
Am I missing something?
Thanks in advance!