Does the nn.funtional.grid_sample backwards grad to grid?

Does the nn.funtional.grid_sample backwards grad to grid??

torch.nn.functional. grid_sample ( input , grid , mode=‘bilinear’ , padding_mode=‘zeros’ )

Hello, I just wonder that the bilinear interpolation operation based ‘grid_sample’ is often used to compute affine transformation or in the spatial transformaiton network

I already know that this function does propagate the grad to the input data
but does this function propagate the grad to the input grid?


@smth @Smrutiranjan_Sahu @system could you plz help me a little?

yes it computes gradient wrt grid.

1 Like

although you will get 0 grad grid if you are using it with nearest neighbor sampling

1 Like

why 0 grad with nearest mode but not ‘bilinear’ mode, big thx~
@SimonW I am little bit confused after hours of thinking, how actually did the back end kernel calculate the grad for each grid point? I just cant figure out, much thx

Because the gradient is zero almost everywhere and undefined otherwise. It’s the same reason why ceil(x) gives all zero gradients.

In forward, each output pixel is a linear interpolation (weighted sum) of input pixels, where the interpolation weights are computed using the mode and grid values. In most cases, this computation is simple and differentiable, e.g., bilinear. So on a high level, it goes like grad_output -> grad_interpolation_weights -> grad grid. (In real code there are a lot of other detailed optimizations.) If you are interested, CPU kernel is at and GPU kernel is at


What a amazing reply, hope it didnt waste your time @SimonW, :+1::+1::point_up_2:
I understand it a little bit more now, and I will step further into the source code you post
If I got further questions I’d like to have your advice, much thx