Image warping for backward flow using forward flow matrix- optical flow


The code I’m using to warp the image is

B, C, H, W = image_B.size()

xx = torch.arange(0, W).view(1,-1).repeat(H,1)

yy = torch.arange(0, H).view(-1,1).repeat(1,W)

xx = xx.view(1,1,H,W).repeat(B,1,1,1)

yy = yy.view(1,1,H,W).repeat(B,1,1,1)

grid =,yy),1).float()

vgrid = grid + flow_AB

vgrid[:,0,:,:] = 2.0*vgrid[:,0,:,:].clone() / max(W-1,1)-1.0

vgrid[:,1,:,:] = 2.0*vgrid[:,1,:,:].clone() / max(H-1,1)-1.0

warped_image = torch.nn.functional.grid_sample(image_B, vgrid.permute(0,2,3,1), 'nearest')

I’ve also tried creating a flow explicitly using image processing techniques

img2 = cv2.imread('frame_0011.png',0)

img1 = cv2.imread('frame_0010.png',0)

flow = cv2.calcOpticalFlowFarneback(img2, img1,  None, 0.5, 3, 15, 3, 5, 1.2, 0)

image2 = cv2.imread('frame_0011.png',1)

plt.imshow(cv2.remap(image2, flow, None, cv2.INTER_LINEAR))

But it doesn’t hold up
This is the difference with the image I want to warp to


I don’t get why it’s not working. Could anyone please help me?

This is the image I’m warping to (imageA) and the image I’m warping from (imageB)

When I use grid sample and flowAB (from A to B), I get this as the result:
Notice the failed warping at the knife.

Is this some kind of reference dataset for optical flow models?
If I’m not mistaken, I’ve seen these images before in a similar question and also the warping at the knife was the issue (however I cannot find it at the moment).

I’m not sure about that, but it was the easiest examples I could easily take and use for test purposes, so I went with it

But, the warping at the knife is pretty much an issue
I’ve tried implementing a purely cv2 implementation, and there’s no difference with the resultant warped image with remap now,
But when using the flow field from cv2 as compared to the flo file here, it gives a similar result with grid_sample
I don’t get the difference between grid_sample and remap. I think that’s crucial here
I’ve looked around the forums for similar queries, but they’ve all been left incomplete

Hello there.
I don’t see anything wrong with the warped image.
I see that the shoulder and the knife are doubled, but as those (partly doubled regions) are occluded parts, that artifact is expected in the warped image. The background is occluded partly with the shoulder and with the knife.

For backward warping the forward flow, we look at the image coordinates, read the flow for each pixel. Then we copy the pixel value of the corresponding pixel (x + u) in that initial coordinate (x). Assume the background has no movement. In this case the flow of that part is zero and the pixel values of the second image in those pixels (which is equal to the same initial coordinates) are copied.
In other words:
warped_img(x) = img2[x + flow(x)].
And to have a better interpolation, bilinear interpolation could be used instead of nearest neighbor in grid_sample.
Are the copied parts the reason you assume something is wrong? If not, sorry for the extra explanation.

It’s the Sintel dataset

Thanks for your nice explanation. However, I’ve seen doubled regions even with the Sintel groundtruth flow, which shouldn’t happen, isn’t it? This really baffles me.

This is normal with the ground truth flow :slight_smile:
Since the background in the image has no movement (or its movement is slighter) the part of the background that can be seen in the first image that got occluded by the moving object in the second image is the doubled area.
To make it simpler to understand, assume an image with a box in the middle with a still background (back ground with no movement). Assume that the box moves 5 pixels to the right from the first to the second image.
Now to warp the second image to the first image with the ground truth flow, you take a look at the coordinates of the first image, read the flow for each pixel, and copy the corresponding pixel’s color in the second and replace the color of that initial pixel in the first image.
In this case, since the back ground’s flow is zero, the part of the background that could be seen in the first image but not in the second image (i.e. got occluded with the object) is copied from the second image (showing parts of the moving object) to the that part of the first image. Also the whole box is copied to its initial position. So in the resulting image you will have the box + a part of the box that occluded those 5 pixels of the background in the second image.

1 Like

Thank you! Your explanation is very clear. Now I realized where I had missed. Seems warping is still somewhat tricky