Convolutional model yields different inputs on parts of image and whole image

Julien_Despois · October 9, 2019, 1:34pm

Hi there,

I am working on a fully convolutional autoencoder, which I train on 256x256 patches of my 1024x1024 images using torchvision RandomCrop data augmentation method on the fly.

I see very different results when using my model to predict one 256x256 patch, versus the whole 1024x1024 source image. From my understanding the local weights of the convolutional kernels should learn to recognize the same patterns in both cases as the scale between the patches and full image is conserved.

Am I missing something?

Thanks in advance!

Julien

ptrblck · October 9, 2019, 4:46pm

I’m not sure I’m understanding this claim correctly, but how would the kernels preserve the scale, if you increase the spatial resolution by 4x?

Julien_Despois · October 9, 2019, 5:23pm

I mean if I take a 1024x1024 image, split it into 256x256 chunks and train on these chunks, running the resulting trained fully convolutional model on the full 1024x1024 model should work, as each feature map should be simply 4 times as wide and high as the ones during training, but that shouldn’t matter as at the kernel level (say 5x5) the resolution is still the same. I should get an output of size 1024x1024 (as I do get) instead of the 256x256 I have in training. No?

ptrblck · October 9, 2019, 6:37pm

Thanks for clarifying the use case, as I’ve misunderstood it.
Yes, you should get the same output (besides the edges from the smaller patches due to padding).

Could you post the code you are using to recreate the larger output?

Julien_Despois · November 8, 2019, 4:43pm

I effectively see differences when the input is scaled (sliding a patch on the large image works fine, but as soon as I change the input image from 256px (size of the patch during training) to 512 then 1024 on the full image, the output of the StarGAN progressively degrades. Here is the model I use. Could it be beacause of the InstanceNorm2D?

Generator(
(encoder): Sequential(
(0): Conv2d(5, 64, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(1): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv2d(64, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(4): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(5): ReLU(inplace)
(6): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
(7): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(8): ReLU(inplace)
(9): ResidualBlock(
(main): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(10): ResidualBlock(
(main): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(11): ResidualBlock(
(main): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(12): ResidualBlock(
(main): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(13): ResidualBlock(
(main): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(14): ResidualBlock(
(main): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU(inplace)
(3): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(4): InstanceNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(decoder): Sequential(
(0): Upsample(scale_factor=2, mode=nearest)
(1): Conv2d(256, 128, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(2): InstanceNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU(inplace)
(4): Upsample(scale_factor=2, mode=nearest)
(5): Conv2d(128, 64, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(6): InstanceNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(7): ReLU(inplace)
(8): Conv2d(64, 4, kernel_size=(7, 7), stride=(1, 1), padding=(3, 3))
(9): Tanh()
)
)

Thanks again!