I have a video frame (W,H,3) and an optical flow (W,H,2). I want to implement a spatial transform layer to compute a new frame warped from the flow. Can someone give me some advice? Thank you.

You can use affine_grid() followed by grid_sample() to perform warping. Here is the documentation : http://pytorch.org/docs/master/_modules/torch/nn/functional.html#grid_sample .

In your case you already have the flow field, so appplying grid_sample(Video_frame, optical_flow) would suffice. However, the flow field should be in the range [-1,1] assuming normalized co-ordinates of the image/video.

Hope that helps

Could you show how to normalize the flow field to the range [-1, 1]? Thanks a lot!

Hi, I wonder how to normalize the flow field to the range [-1, 1].

Hi @IanYeung and @hubertlee915,

I also had the same problem, but I guess this is how to address it:

Suppose your image has a shape of `[H,W]`

,

then you would need to normalize the values of the horizontal flow map by `W`

, and those of vertical flow map by `H`

.

If you want to do warping, this normalized flow map should be added to an x-y meshgrid of [-1,1] with shape of `[H,W]`

, something like this:

`np.meshgrid(np.linspace(-1,1,W), np.linspace(-1,1,H))`

.

The resulting map can be given as the `grid`

argument to the `grid_sample`

function.

Thanks! It makes sense. But I think we do not need the meshgrid function since the flow field (of size ((N x H x W x 2)) is already a grid.

The flow field is indeed a grid of the same size.

However, `grid_sample`

function is a sampler that maps pixels in the `input`

argument according to their corresponding position in the `grid`

argument. You need a neutral grid as the base sampler to map an image to itself, and the flow field comes on top of that and indicates how each pixel has moved differently.

Suppose the flow is zero and you want to map an image to itself, then in that case, using an all-zero `grid`

would produce a non-sense output.

You can try both with flow-only, and with flow+mesh{[-1,1]} and compare the results.

Just note that in my case, the image `input`

argument needed to be normalized between [0,1] rather than [0,255].

Yes, you are right, thank you very much!