I want to apply an affine transform to a 2D image based on its estimated depth map. That means if I shift the image to the right, objects that are close should be shifted more than objects in the background. The same goes for a zoom by scaling.
Unfortunately, I can’t get it to work correctly at the moment.
So what I have is an image of shape (512, 512, 3) and a depth map of shape (512, 512).
How I see it should be done:
- Construct an affine transform X, e.g. with Kornia
- Construct a flow field using
- Apply the flow field using
My initial idea was to multiply the flow field with the depth map. But I realized that this way an identity transform would also be morphed by the depth map. So I tried taking the difference of the affine and identity transform flow fields after having scaled both by the depth map. For some reason, this does not work out…
Here’s what I have so far:
# Load the image and depth map from PIL import Image img_path = "test.png" depth_path = "test_depth.png" img = torch.tensor(np.array(Image.open(img_path)).permute(2, 0, 1).float() / 255 depth = np.array(Image.open(depth_path)).astype(np.float32) / 255 # identity matrix and flow field identity_matrix = kornia.geometry.transform.get_projective_transform(center=torch.tensor([[0.5, 0.5, 0.5]]), angles=torch.tensor([[0., 0., 0.]]), scales=torch.tensor([[1.0, 1.0, 1.0]]),) # 3D all h, w, = 512, 512 coords_3d_identity = torch.nn.functional.affine_grid(identity_matrix, [1, 1, 1,h,w], align_corners=False) coords_2d_identity = coords_3d_identity[..., :2] # 3D matrix and affine transform flow field - zoom in by 50% matrix = kornia.geometry.transform.get_projective_transform(center=torch.tensor([[0.5, 0.5, 0.5]]), angles=torch.tensor([[0., 0., 0.]]), scales=torch.tensor([[1.0, 1.0, 1.5]]),) # 3D all coords_3d = torch.nn.functional.affine_grid(matrix, [1, 1, 1, h, w], align_corners=False) coords_2d = coords_3d[..., 1:3] # apply depth to flow fields matched_depth = torch.tensor(depth).unsqueeze(-1).repeat(1, 1, 2).unsqueeze(0).unsqueeze(0) multiplied_identity_field = coords_2d_identity * matched_depth multiplied_transform_field = coords_2d * matched_depth diff_field = multiplied_transform_field - multiplied_identity_field img_transformed = F.grid_sample(img.unsqueeze(0), (diff_field.squeeze(0), padding_mode="zeros", #reflection align_corners=False)