Torch.matmul can't backpropagate

Abdelrahman_Akram · June 2, 2021, 6:24pm

In this code:

def rotation_points_single_angle_cuda(points, angle, axis=0):
    # points: [N, 3]
    rot_sin = np.sin(angle)
    rot_cos = np.cos(angle)
    if axis == 1:
        rot_mat_T = np.array(
            [[rot_cos, 0, -rot_sin], [0, 1, 0], [rot_sin, 0, rot_cos]],
        
        )
    elif axis == 2 or axis == -1:
        rot_mat_T = np.array(
            [[rot_cos, -rot_sin, 0], [rot_sin, rot_cos, 0], [0, 0, 1]],
        )
    elif axis == 0:
        rot_mat_T = np.array(
            [[1, 0, 0], [0, rot_cos, -rot_sin], [0, rot_sin, rot_cos]],
            
        )
    else:
        raise ValueError("axis should in range")
    
    points_ = torch.matmul(points,torch.from_numpy(rot_mat_T).float().cuda())

    return points_

It gives me error that there is value changed in-place.

This is the error:

[W python_anomaly_mode.cpp:60] Warning: Error detected in MmBackward. Traceback of forward call that caused the error:
  File "train.py", line 137, in <module>
    train(args)
  File "train.py", line 121, in train
    batch = COR(infos, sample["idx"], pred_disp ,pipeline )
  File "/notebooks/E2E/cor_interface.py", line 52, in COR
    res, _ = pipeline(res_temp, infos[idx[i]])
  File "/notebooks/cia/det3d/datasets/pipelines/compose.py", line 23, in __call__
    res, info = t(res, info)
  File "/notebooks/cia/det3d/datasets/pipelines/preprocess_v4.py", line 170, in __call__
    gt_dict["gt_boxes"], points = prep.global_rotation(gt_dict["gt_boxes"], points,
  File "/notebooks/cia/det3d/core/sampler/preprocess.py", line 826, in global_rotation
    points[:, :3] = box_np_ops.rotation_points_single_angle_cuda(points[:, :3], noise_rotation, axis=2)
  File "/notebooks/cia/det3d/core/bbox/box_np_ops.py", line 453, in rotation_points_single_angle_cuda
    points_ = torch.matmul(points,torch.from_numpy(rot_mat_T).float().cuda())
 (function print_stack)
Traceback (most recent call last):
  File "train.py", line 137, in <module>
    train(args)
  File "train.py", line 129, in train
    total_loss.backward()
  File "/opt/conda/envs/cia/lib/python3.8/site-packages/torch/tensor.py", line 185, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/opt/conda/envs/cia/lib/python3.8/site-packages/torch/autograd/__init__.py", line 125, in backward
    Variable._execution_engine.run_backward(
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [17045, 3]], which is output 0 of SliceBackward, is at version 4; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

ptrblck · June 2, 2021, 6:36pm

Could you check the next operations applied on points_, which might manipulate it inplace?

Abdelrahman_Akram · June 2, 2021, 6:40pm

def global_rotation(gt_boxes, points, rotation=np.pi / 4):
    if not isinstance(rotation, list):
        rotation = [-rotation, rotation]
    noise_rotation = np.random.uniform(rotation[0], rotation[1])
    points[:, :3] = box_np_ops.rotation_points_single_angle_cuda(points[:, :3], noise_rotation, axis=2)
    gt_boxes[:, :3] = box_np_ops.rotation_points_single_angle(gt_boxes[:, :3], noise_rotation, axis=2)
    if gt_boxes.shape[1] > 7:
        gt_boxes[:, 6:8] = box_np_ops.rotation_points_single_angle(
            np.hstack([gt_boxes[:, 6:8], np.zeros((gt_boxes.shape[0], 1))]),
            noise_rotation,
            axis=2,
        )[:, :2]
    gt_boxes[:, -1] += noise_rotation
    return gt_boxes, points

in this function ?

            gt_dict["gt_boxes"], points = prep.random_flip(gt_dict["gt_boxes"], points)
            gt_dict["gt_boxes"], points = prep.global_translate_(gt_dict["gt_boxes"], points, self.global_translate_noise_std)
            gt_dict["gt_boxes"], points = prep.global_rotation(gt_dict["gt_boxes"], points,
                                                               rotation=self.global_rotation_noise)
            gt_dict["gt_boxes"], points = prep.global_scaling_v2(gt_dict["gt_boxes"], points,
                                                                 *self.global_scaling_noise)

only the global_rotation gives error, If I comment it it works.

ptrblck · June 3, 2021, 12:29am

I don’t know, if this operation:

points[:, :3] = box_np_ops.rotation_points_single_angle_cuda

would use the previous points_ tensor, but if so, it could be problematic, since it’s changing it inplace, so you might want to create a new tensor instead.