Building transformation matrix fails - one of the variables needed for gradient computation has been modified by an inplace operation

Hi,

I have a little network that is generating 4 values (angle, scale, translation x, translation y) that I want to use to build an affine transformation matrix. I am doing it like this (the forward pass works, but backprop fails)

trans_matrix = torch.stack([
            trans_params[:, 1] * torch.cos(trans_params[:, 0]), trans_params[:, 1] * -1 * torch.sin(trans_params[:, 0]), trans_params[:, 2],
            trans_params[:, 1] * torch.sin(trans_params[:, 0]), trans_params[:, 1] *      torch.cos(trans_params[:, 0]), trans_params[:, 3]
        ]).view(-1, 2, 3)

I get this error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

This is the line that fails:

trans_params[:, 1] * torch.sin(trans_params[:, 0]), trans_params[:, 1] *      torch.cos(trans_params[:, 0]), trans_params[:, 3]

I guess that the slicing operations are the problem here. But how can I compose a matrix of sin and cos that are applied to certain dimensions of another tensor correctly?

There is also another location in my code that causes this error message:

affine_params[:, 0] = -1.0 * affine_params[:, 0] # Fails
affine_params[:, 1] = 1 / affine_params[:, 1] # works

This is even more confusion to me. Why does the division work but the multiplication not?
affine_params is the same tensor type as trans_params in the first code example.

Is there maybe even a more elegant way to construct an affine transformation matrix?

If I do affine_params[:, 0] = -1.0 * affine_params[:, 0].clone() instead of affine_params[:, 0] = -1.0 * affine_params[:, 0], the line works. But why? And why is the division working without .clone()?

The other code snippet works when I do

trans_params = trans_params.clone()
trans_matrix = torch.stack([
    trans_params[:, 1] * torch.cos(trans_params[:, 0]), trans_params[:, 1] * -1 * torch.sin(trans_params[:, 0]), trans_params[:, 2],
    trans_params[:, 1] * torch.sin(trans_params[:, 0]), trans_params[:, 1] *      torch.cos(trans_params[:, 0]), trans_params[:, 3]
        ]).view(-1, 2, 3)

instead of

trans_matrix = torch.stack([
    trans_params[:, 1] * torch.cos(trans_params[:, 0]), trans_params[:, 1] * -1 * torch.sin(trans_params[:, 0]), trans_params[:, 2],
    trans_params[:, 1] * torch.sin(trans_params[:, 0]), trans_params[:, 1] *      torch.cos(trans_params[:, 0]), trans_params[:, 3]
        ]).view(-1, 2, 3)

So it seems that cloning fixes the error message. But it also seems that the gradients don’t flow properly anymore because my loss is not going down.

This also fails

trans_matrix = torch.zeros((trans_params.shape[0], 2, 3)).to(trans_params.device)
trans_matrix[:,0,0] = trans_params[:, 1] * torch.cos(trans_params[:, 0])
trans_matrix[:,0,1] = trans_params[:, 1] * -1 * torch.sin(trans_params[:, 0])
trans_matrix[:,0,2] = trans_params[:, 2]
trans_matrix[:,1,0] = trans_params[:, 1] * torch.sin(trans_params[:, 0])
trans_matrix[:,1,1] = trans_params[:, 1] * torch.cos(trans_params[:, 0])
trans_matrix[:,1,2] = trans_params[:, 3]

with

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

but interestingly not in the first line, but in

trans_matrix[:,1,1] = trans_params[:, 1] * torch.cos(trans_params[:, 0])

Could it be that the actual error is not in this code snipped but somewhere earlier in my code? How can I find out what is going wrong?