I have a video frame (W,H,3) and an optical flow (W,H,2). I want to implement a spatial transform layer to compute a new frame warped from the flow. Can someone give me some advice? Thank you.
You can use affine_grid() followed by grid_sample() to perform warping. Here is the documentation : http://pytorch.org/docs/master/_modules/torch/nn/functional.html#grid_sample .
In your case you already have the flow field, so appplying grid_sample(Video_frame, optical_flow) would suffice. However, the flow field should be in the range [-1,1] assuming normalized co-ordinates of the image/video.
Hope that helps
Could you show how to normalize the flow field to the range [-1, 1]? Thanks a lot!
Hi, I wonder how to normalize the flow field to the range [-1, 1].
Hi @IanYeung and @hubertlee915,
I also had the same problem, but I guess this is how to address it:
Suppose your image has a shape of [H,W]
,
then you would need to normalize the values of the horizontal flow map by W
, and those of vertical flow map by H
.
If you want to do warping, this normalized flow map should be added to an x-y meshgrid of [-1,1] with shape of [H,W]
, something like this:
np.meshgrid(np.linspace(-1,1,W), np.linspace(-1,1,H))
.
The resulting map can be given as the grid
argument to the grid_sample
function.
Thanks! It makes sense. But I think we do not need the meshgrid function since the flow field (of size ((N x H x W x 2)) is already a grid.
The flow field is indeed a grid of the same size.
However, grid_sample
function is a sampler that maps pixels in the input
argument according to their corresponding position in the grid
argument. You need a neutral grid as the base sampler to map an image to itself, and the flow field comes on top of that and indicates how each pixel has moved differently.
Suppose the flow is zero and you want to map an image to itself, then in that case, using an all-zero grid
would produce a non-sense output.
You can try both with flow-only, and with flow+mesh{[-1,1]} and compare the results.
Just note that in my case, the image input
argument needed to be normalized between [0,1] rather than [0,255].
Yes, you are right, thank you very much!