How does grid_sample(x, grid) work?

I don’t know how grid_sample() compute the output base on the grid and input?Does any one give me an example?Thank you.

1 Like

Hi,
The doc on grid_sample contains details on how the output is computed. Let us know if you need more precisions.

This function just passes parameters.The core is written in C. So I’m confused. Thank you for your apply!

Ho,
the implementation is in this file. In particular, the forward method (that computed the output) is this one.
Here is a quick description of what the C code is doing:

  • here check that the input is valid.
  • here initialize stuff and make the output have the right size.
  • Then it iterates over the batch then height then width.
  • For each of these, it gets the input coordinated ix and iy here.
  • here it computes the 4 points around it that will be used for the interpolation.
  • here it computes the weights for each of these 4 points.
  • here it clips stuff if you asked for it.
  • Then it iterates over the channels.
  • It get the 4 points values, compute the interpolation value then save it in the output tensor.
25 Likes

That’s exactly what I want ! Many thanks !:+1:

it seems grid_sample supports 2D and 3D images.

So how to implement padding mode ?

Are you trying to use the padding_mode option of grid_sample()? If so, you can find it documented with an explanation here, and if there is something in the explanation you don’t understand, feel free to ask.

Or are you trying to implement a new type of padding for grid_sample()? That is, other than the existing zeros, border, and reflection. If so, then that’s a completely different matter.

It’s not clear from your question which one you need help with.

Actually, I want to implement padding mode=zeros in tensorflow, so I want to know how to implement padding mode in pytorch source code…

Oh, okay, so here is the deal:
PyTorch actually currently has 3 different underlying implementations of grid_sample() (a vectorized cpu 2D version, a nonvectorized cpu 3D version, and a CUDA implementation for both 2D and 3D), but their behavior is essentially supposed to be the same.

In my opinion, the easiest of the three to understand, if you just want to get the basic idea, is the CUDA version, which you can find here for the 2D case.

The important lines in terms of zero padding are these, which implicitly perform the zero padding by calling the within_bounds_2d() to only add each term to the bilinear interpolation if it is in bounds. Any out-of-bounds grid points will get 0, of course, since nothing will then be added to the 0 from line 198.

Note that this out-of-bounds check does not affect the border and reflection padding modes, since in those cases, the grid points will have previously been brought in-bounds by the clip and reflect operations here.

Now, if you want to implement a similar zero padding behavior in TensorFlow, here’s how I would do it:
Take a look at this _interpolate() function. First of all, I should note that this function is not quite the same as the PyTorch grid_sample() in two ways:

  1. It is not meant to be called externally, only as part of transformer(), and so it actually takes the grid as two flattened tensors x and y. Of course, if you follow the code on these lines, you can figure out how to reformat your grid this way. (No need to multiply by an affine matrix if you already have the grid you want to use. This just produces an affine grid.)

  2. It miscalculates the conversion from the [-1,+1] of the grid to pixel indices, and is in fact off by half a pixel. If half a pixel doesn’t bother you, then great. If it does, fixing it is a bit more involved, but also possible.

So supposing you can get over these two hurdles, how do you implement zero padding? First thing you have to do is remove the coordinate clipping on these lines, since you want to know when a grid point is out-of-bounds. Second, you have to add a check for out-of-bounds grid points and either zero out their values here or their interpolation weights here.

That should be more than enough if you’re running this on GPU. If you’re running it on CPU, then you need to be careful here that you’re not going out of bound in the flat 1D tensor. That’s because otherwise, you will probably get an out-of-bounds error.

Hopefully this is enough to get you where you need.

2 Likes

Oh, I see, thank you so much!

Sorry to bother you, I have some doubts about the creation of the parameter “grid” in “grid_sample()”. Can I use CNN to generate the grid of NHW*2 directly, and get a valid grid through training? Is this idea feasible?

@bnehoran Thank you for your detailed explanation!
I have a question about the API. Why grid, i.e. the coordinates to sample from source tensor, has to be in range [-1, 1]? Since in the c++ / cuda implementation, the [-1, 1] coordinate still has to be converted to the scale of source tensor size, why not require the value grid to be real coordinate rather than scaled one?

2 Likes

@TG_N Yes! That is a great idea!
In fact, this is how a number of convolutional neutral nets for image registration and optical flow are structured.
The CNN predicts a flow/grid/mapping/displacement field which is then passed to grid_sample() or the like (you may have to do some conversions to fit the conventions), and then backpropogate from the result.

@harryhan618 that’s a really good question.

The convention of representing the image space using [-1, 1] comes from the original 2015 paper that introduced Spatial Transformer Networks (after which PyTorch’s grid_sample() and other similar functions in other ML frameworks are based).

A reasonable justification of this convention is that it allows coordinates to be specified without respect to the resolution of the underlying image. That is, the same image at different resolutions should still be sampled at the same location.
So it’s a sort of resolution-agnostic representation.

That said, I agree that there should be an option to pass in the grid in unnormalized pixel units. I’ve pushed for such an option in the past, and maybe if I have extra time at some point, I’ll go through and implement it.

2 Likes

Is there a way I can pad with a custom value which is not zero? Let’s say I want to pad with 1?

@gfotedar There’s no direct way of padding with a non-zero value, but there are some workarounds you can try. Of course, it depends on your use case whether these workarounds are good enough for you.

I think the easiest one is this:
Suppose you want to use a padding value of val. Start by subtracting val from the input image you’re intending to sample. Then pass to grid_sample() with padding_mode="zeros". Then just add back val to the result.

There are other, more complicated ways, like adding a border to your image and using "border" padding, but I would highly recommend not doing this since you’d have to adjust the grid to match, and it’s just very easy to get it wrong.

What is it specifically that you’re trying to acomplish, by the way?

1 Like

Hi, I want to interpolate the value from a discreted image. Speciafically, I have a grid, a mask and an input. Only the points whose mask is 1 of the input will contribute the interpolation. Is there any way to achieve this with grid_sample function?

= =||| I am confused about example of reflection padding_mode

In doc:

padding_mode="reflection" : use values at locations reflected by the border for out-of-bound grid locations. For location far away from the border, it will keep being reflected until becoming in bound, e.g., (normalized) pixel location x = -3.5 reflects by border -1 and becomes x' = 1.5 , then reflects by border 1 and becomes x'' = -0.5 .

but why reflects by border1 becomes x'' = -0.5 not x'' = 0.5?

if x = 6.5,it does mean that x' = -4.5,x'' = 2.5,x''' = -0.5 or x''' = 0.5?

Looking forward to your reply and thx for your help

1 Like

I’m not sure-
does grid_sample actually perform forward warping or backward warping?

maybe it’s better to ask about the combination of:
grid = F.affine_grid(theta, x.size())
x = F.grid_sample(x, grid)

does it look at every pixel location in the dst image and asks- what source pixels should I place here? (backward)
or does it simply moves the src image to the dst image using theta (and hence might leave holes)? (forward)

many thanks!