Hi,
Is there a way to make torch.nn.functional.interpolate deterministic? (its backward is not deterministic. This seems the case when upscaling. Downscaling seems to be deterministic).
When using the CUDA backend, this operation may induce nondeterministic behaviour in be backward that is not easily switched off. Please see the notes on Reproducibility for background.
Any news on this? torch.nn.ConvTranspose2d does not work as well as torch.nn.functional.interpolate. I want to upscale some feature maps right away (in a non-parametric way) without doing transposed convolution many times!! this last one seems to lose information!!!
torch.nn.functional.interpolate seems to work but it is not deterministic (unstable results)
torch.nn.ConvTranspose2d does not seem to work, but it is deterministic (stable results).
If you want to get stable results, I am afraid that you should use nn.ConvTranspose2d instead of interpolate. I agree the view you mentioned that the transposed convolution makes the information lose.
I evaluated that the transposed convolution loses the more information as the bigger upsampling rate used (caused by its implemented mechanism). For counterworking lose information, you could add some addtion block to refine the output from convtransposed convlution.
But it is more expensive compared to the parameter-free operation interpolate.
Hi @MariosOreo ,
Thanks for your comment!
Unfortunately, in my case, the upscaling factor is huge! (x32) from a single map of 15x15. I assume thatβs why F.interpolate is extremely unstable, and what makes transposed convolution (as you mentioned) lose information (it requires 4 transposed convolution). The learning in my case, for some considerations, is achieved without output target which makes the spatial loss of information worse (I think that most architectures that work using upscaling, in segmentation for instance, it is due to the supervised target provided at the output that allows the transposed convolution to learn something spatial). In my experiments, the transposed convolution loses entirely the spatial information that was present in the map of 15x15 (compared to F.interpolate which preserves in a nice way the learned abstract spatial activations).
I wonder if people working on GANs and super-resolution come across such issue of instability of the interpolation? (by the way, it makes learning unstable and difficult since the backward is not deterministic which makes the gradient non-deterministic as well).
I checked some super-resolution (SR) papers, and it seems that interpolation, and transposed convolution is the tool to go. However, often, the interpolation is done over the input image then it is fed to the network (I do not think they include it in the training graph. So, it does not have an impact on the stability).
Actually, I am exploring another tool that mimics interpolation and which was developed within SR domain: pixel shuffle (paper). If one wants to upscale a map of β(h, w)β by a factor of xR, one needs to have βR^2β feature maps to obtain an upscaled map of (R*h, R*w). I am doing this in two ways:
The way it is supposed to be done: learn a feature map of (#batch, R*R, h, w), then call torch.nn.functional.pixel_shuffle. It behaves the same way as transposed convolution: total loss of spatial information.
Duplicate the map to be upscaled, of size 15x15, RxR times using torch.Tensor.expand(): my_map.expand(-1, RxR, -1, -1). Pixel shuffle seems to upscale right away the map as it is. Two downsides (R=32):
Since I duplicate each pixel in the small map 32x23 times the upscaled map is not smooth compared to F.interpolate. It is pixelized, and it does not look like it was upscaled (as we know upscaling. It looks like when you extremely zoom into an image and you start seeing pixels!!!). At this point, the main advantage of the interpolation is that it does interpolation!!! (i.e., it adds new values to cover the new large scale) while the way I did it using duplication does not add anything new.
Probably due to this unusual way to upscale, after the first update during training, the layer before this operation starts to output ONLY large positive values (up to 10^3). So, it is a problem. The network does not learn (compared to when using interpolation).
Note: torch.nn.functional.pixel_shuffle is deterministic.
So, for now, I did not find something that does the job as F.interpolate.