Struggling to Understand F.grid_sample for this example

I have a input tensor of size [1,32,296,400]

and I have a pixel set of [1, 56000, 400, 2]

After applying grid_sample with mode=‘bilinear’ I have [1, 32, 56000, 400]

Can I know what exactly happened here?


All first dimensions (dim=0) correspond to the batch. Then dimension 1 of the input tensor is the number of channels. Then the rest of the input is a grid of size (296 x 400). The pixel set corresponds to another grid of size (56000 x 400). This resulting grid will be populated using information located in the input grid. So the last 2 channels are coordinates where to find the information to populate the result table.
The mode=“bilinear” says that is you have non-discrete coordinates, it linearly interpolates the resulting vector (of size 32) base on the neighboring vertices in the input grid.

Hence you obtain a result of dim (1 x 32 x 56000 x 400) because your batch is one and you sampled a 32 x 296 x 400 grid at 56000 x 400 locations.

Hope this helps.