Hi @MariosOreo ,
Thanks for your comment!
Unfortunately, in my case, the upscaling factor is huge! (x32
) from a single map of 15x15
. I assume that’s why F.interpolate
is extremely unstable, and what makes transposed convolution (as you mentioned) lose information (it requires 4
transposed convolution). The learning in my case, for some considerations, is achieved without output target which makes the spatial loss of information worse (I think that most architectures that work using upscaling, in segmentation for instance, it is due to the supervised target provided at the output that allows the transposed convolution to learn something spatial). In my experiments, the transposed convolution loses entirely the spatial information that was present in the map of 15x15
(compared to F.interpolate
which preserves in a nice way the learned abstract spatial activations).
I wonder if people working on GANs and super-resolution come across such issue of instability of the interpolation? (by the way, it makes learning unstable and difficult since the backward is not deterministic which makes the gradient non-deterministic as well).
I checked some super-resolution (SR) papers, and it seems that interpolation, and transposed convolution is the tool to go. However, often, the interpolation is done over the input image then it is fed to the network (I do not think they include it in the training graph. So, it does not have an impact on the stability).
Actually, I am exploring another tool that mimics interpolation and which was developed within SR domain: pixel shuffle (paper). If one wants to upscale a map of ‘(h, w)’ by a factor of xR
, one needs to have ‘R^2’ feature maps to obtain an upscaled map of (R*h, R*w)
. I am doing this in two ways:
- The way it is supposed to be done: learn a feature map of
(#batch, R*R, h, w)
, then calltorch.nn.functional.pixel_shuffle
. It behaves the same way as transposed convolution: total loss of spatial information. - Duplicate the map to be upscaled, of size
15x15
,RxR
times usingtorch.Tensor.expand()
:my_map.expand(-1, RxR, -1, -1)
. Pixel shuffle seems to upscale right away the map as it is. Two downsides (R=32
):- Since I duplicate each pixel in the small map
32x23
times the upscaled map is not smooth compared toF.interpolate
. It is pixelized, and it does not look like it was upscaled (as we know upscaling. It looks like when you extremely zoom into an image and you start seeing pixels!!!). At this point, the main advantage of the interpolation is that it does interpolation!!! (i.e., it adds new values to cover the new large scale) while the way I did it using duplication does not add anything new. - Probably due to this unusual way to upscale, after the first update during training, the layer before this operation starts to output ONLY large positive values (up to
10^3
). So, it is a problem. The network does not learn (compared to when using interpolation).
- Since I duplicate each pixel in the small map
Note: torch.nn.functional.pixel_shuffle
is deterministic.
So, for now, I did not find something that does the job as F.interpolate
.