A problem of transforming pointcloud to depth map

I try to transform a pointcloud to depth map in pytorch, but I don’t know how to keep the gradient flow when handling this task.
The inputs include an intrinsic matrix, an extrinsic matrix and a pointcloud.

  1. First I try to multiple the extrinsic matrix and intrinsic matrix to get a projection matrix
  2. Then I use the projection matrix to multiple the pointcloud
  3. I use the z value representing the depth to divide the x and y value to get the coordinate indices
  4. It all works fine before I try to convert the previous result to depth map, but the coordinate indices is float type, and matrix index must be bool, int, long type. I must convert indices to long type to make it possible to put correct values to depth map.

The problem is that long type tensor can not calculate gradient and the operations of index can not be backward.
When I try to calculate loss based on the generated depth map, the gradient backward is abnormal. I suppose that it only backward the gradient of depth values to the transformed pointcloud z values which don’t change a lot in the projection process.
Does anyone come up with a solution to fix it?
Thanks a lot in advance!

import torch
import numpy as np

filename = './0000000000.bin'
points = np.fromfile(filename, dtype=np.float32).reshape(-1, 4)
points[:, 3] = 1.0
points = points[points[:, 0] >= 0, :]

points = torch.Tensor(points).cuda()
points.requires_grad_(True)

intrinsic_matrix = np.array(
    [7.215377e+02, 0.000000e+00, 6.095593e+02, 0.000000e+00,
     0.000000e+00, 7.215377e+02, 1.728540e+02, 0.000000e+00,
     0.000000e+00, 0.000000e+00, 1.000000e+00, 0.000000e+00,
     0.000000e+00, 0.000000e+00, 1.000000e+00, 1.000000e+00]).reshape(4, 4)
intrinsic_matrix = torch.Tensor(intrinsic_matrix).cuda()

fx = intrinsic_matrix[0, 0].item()
fy = intrinsic_matrix[1, 1].item()

extrinsic_matrix_R = np.array(
    [7.533745e-03, -9.999714e-01, -6.166020e-04, 1.480249e-02, 7.280733e-04, -9.998902e-01, 9.998621e-01, 7.523790e-03,
     1.480755e-02]).reshape(3, 3)
extrinsic_matrix_T = np.array([-4.069766e-03, -7.631618e-02, -2.717806e-01]).reshape(3, 1)

extrinsic_matrix = np.vstack((np.hstack((extrinsic_matrix_R, extrinsic_matrix_T)), np.array([[0, 0, 0, 1]])))
extrinsic_matrix_fixed = np.array([1.0, 0.0, 0.0, 4.485728e+01 / fx,
                                   0.0, 1.0, 0.0, 2.163791e-01 / fy,
                                   0.0, 0.0, 1.0, 2.745884e-03,
                                   0.0, 0.0, 0.0, 1.0]).reshape(4, 4)
extrinsic_matrix = torch.Tensor(np.matmul(extrinsic_matrix_fixed, extrinsic_matrix)).cuda()

projection_matrix = torch.matmul(intrinsic_matrix, extrinsic_matrix)[:3, :]
print(projection_matrix)

cam_points = torch.matmul(projection_matrix, points.T)
eps = 1e-4
pix_coords = cam_points[:2, :] / (cam_points[2, :] + eps)

depth_map = torch.zeros(375, 1242).cuda()
for i in range(pix_coords.shape[1]):
    if 1242 > pix_coords[0, i].long() > 0 and 375 > pix_coords[1, i].long() > 0:
        depth_map[pix_coords[1, i].long(), pix_coords[0, i].long()] = cam_points[2, i]

Hi, I’m facing the same problem, have you solved this?
Thanks in advance!!

Hi. Unfortunately, I haven’t solved the problem directly. I use an alternative way to keep the gradient flow in a pointcloud like format. After projection we can get the 2d coordinates and coressponding depth values, so I choose use (u, v, z) format representation and transform groundtruth depth map in the same format then calulate the loss. Most important step is to make predicted (u, v, z) and groudtruth (u, v, z) matching according to your task. Specificly, it’s about the pointcloud to pixel relationships like which point should be correlated with which pixel. Wish this solution is helpful.