Spatial Transformer Networks : Boundary grid interpolation behaviour

Hi there, just a quick insight on how the function grid sample behaves when grid values are outside [-1,1].
When values are way off, it obviously outputs 0, and 0 gradient is computed from the resulting sampling, which makes sense.
But if your values are close to -1 or 1, it will make an interpolation between the actual values of the feature map you are sampling from and 0 values, as if feature map was padded with 0.

If this behaviour can be kind of rational for the original use of STN, warping feature maps to make classification with it, it can pose some problems when STN is the last layer of your network, followed by a pixel wise difference loss.

Context

A actual example where i encountered a problem is with a recent project I made, which failed to converge at first : https://github.com/ClementPinard/SfmLearner-Pytorch.

Basically, we try to warp frames with a displacement and depth map to match a reference one. And if the movement is something like moving backward, it is expected that the warped image will have gray values after warping. here is an exemple : respectvely, target image, ref image, warped ref, difference (which is to minimize)




Now if we use normal grid_sample, some mix between gray and pixels will appear. Any optimizer will then try to reduce the gray part of that mix, because even if the valid pixel the resulting is sampling from is not the right one for the warped picture to be perfectly aligned, it’s still better to have, say, some greenish pixel than a gray one, and that behaviour will stay as long as there will be some non zero gradient to a grayish pixel.

This will result in warping such as the former example being impossible to reach, because there is a way for the optimizer to avoid gray pixels.

The only working behaviour is then to cancel every sampling for values outside of [-1,1] no matter how close it is to boundaries. This will result in aliased warping, but no way for the optimizer to minimize the photometric loss of gray pixels.

To get this particular behaviour, I did a little hack that i don’t like very much : https://github.com/ClementPinard/SfmLearner-Pytorch/blob/master/inverse_warp.py#L60 , essentially set grid coordinates outside [-1,1] at 2 to make sure it’s far from -1 or 1

Question

Would it be a good idea to add an option for grid_sample to not make a bilinear interpolation when out of bounds ? As far as I understang, grid_sample is pretty much just a wrapper around CudNN’s sampler, so it might not be easy to get a very efficient implementation, but maybe there is something better than my hack to solve this problem.

Thanks !
Clément

1 Like

There’s a better behaviour for the grid sampler being worked on: https://github.com/pytorch/pytorch/issues/2625.

didn’t know about it ! Thanks !