Cuda op implementation

is there any docs for cuda implementation of op in thcunn? thx

Here’s an example of implementing SpatialGridSamplerBilinear in THCUNN: https://github.com/pytorch/pytorch/pull/2737/files

Hi. Richar. I would like to know how could you implement the SpatitalGridSamplerBilinear for backward propagation. It seem that there is no formula for gradient of SpatitalGridSamplerBilinear. Any reference for it?
By the way, why not you implement the cuda code by using shared memory for speed up computation?

I imagine you could use the output rule of SpatialGridSamplerBilinear and differentiate that to find what the gradient should be. It is a little complicated though and I don’t know where to find more information on this (https://arxiv.org/pdf/1506.02025.pdf is the paper on Spatial Transformer Networks but I don’t think it describes a grid sampler like this).

I implemented the cuda code by translating https://github.com/pytorch/pytorch/blob/master/torch/lib/THNN/generic/SpatialGridSamplerBilinear.c directly to cuda. Feel free to toss up a pr if you think shared memory will help speed up the computation! (I’m not entirely sure what you mean so it would be great if you could elaborate :slight_smile: )

In general my advice for implementing updateGradInput for a new operation is to write down the expression for updateOutput and differentiate it by hand and then put that into code.

1 Like

Thank you for your nicely reply.

I think read global memory data to shared memory and compute on shared memory may reduce latency.