Failing to create a gradiented rotation matrix

I’m trying to find the rotation and translation which best maps one set of 3D points to another. My code:

    reg_verts_tensor = torch.from_numpy(reg_verts).float().to(device)
    verts_tensor = torch.from_numpy(verts).float().to(device)

    tensor_0 = torch.zeros(1).float().to(device)
    tensor_1 = torch.ones(1).float().to(device)


    criterion = torch.nn.MSELoss(reduction='sum')

    n_register_epochs = 1000
    register_initial_learning_rate = 0.001

    register_learning_rate = register_initial_learning_rate
    optimizer = torch.optim.SGD([translation, rotation_x, rotation_y, rotation_z], lr=register_learning_rate)
    
    loss_list = []

    for epoch in range(n_register_epochs):
        optimizer.zero_grad()

        epoch_loss = None

        transformed_verts = verts_tensor + translation

        rotation_matrix_x = torch.stack([torch.stack([tensor_1,tensor_0,tensor_0]),
                        torch.stack([tensor_0,torch.cos(rotation_x), -torch.sin(rotation_x)]),
                        torch.stack([tensor_0, torch.sin(rotation_x), torch.cos(rotation_x)])]).reshape(3,3)

        rotation_matrix_y = torch.stack([torch.stack([torch.cos(rotation_y), tensor_0, - torch.sin(rotation_y)]),
                            torch.stack([tensor_0, tensor_1, tensor_0]),
                            torch.stack([torch.sin(rotation_y), tensor_0, torch.cos(rotation_y)])]).reshape(3,3)

        rotation_matrix_z = torch.stack([torch.stack([torch.cos(rotation_z), torch.sin(rotation_z), tensor_0]),
                            torch.stack([-torch.sin(rotation_z), torch.cos(rotation_z), tensor_0]),
                            torch.stack([tensor_0,tensor_0,tensor_1])]).reshape(3,3)

        for i in range(transformed_verts.shape[0]):
            transformed_verts[i] = torch.matmul(rotation_matrix_x, transformed_verts[i])
            transformed_verts[i] = torch.matmul(rotation_matrix_y, transformed_verts[i])
            transformed_verts[i] = torch.matmul(rotation_matrix_z, transformed_verts[i])

        #

        loss = criterion(transformed_verts, reg_verts_tensor)


        epoch_loss = loss

        if epoch % 10 == 0:
            print(epoch_loss.item())

        epoch_loss.backward()

This fails on the “backwards” call with:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [3]], which is output 0 of AsStridedBackward0, is at version 1323; expected version 1322 instead.

What am I doing wrong?

The detector says it’s here:

transformed_verts[i] = torch.matmul(rotation_matrix_z, transformed_verts[i])

What’s the gradientable way of doing that?

This works:

        lines = []

        for i in range(transformed_verts.shape[0]):
            res = torch.matmul(rotation_matrix_x, transformed_verts[i])
            res = torch.matmul(rotation_matrix_y, res)
            res = torch.matmul(rotation_matrix_z, res)

            lines.append(res)

        transformed_verts = torch.stack(lines)

Hi Ian!

As you’ve noted in your subsequent post, at issue is that you are modifying
transformed_verts “inplace” by writing into it through an index. Depending
on what else is going on in autograd’s “computation graph,” this can block
backpropagation.

This solution works.

But why?

Remember that python “variables” are references to python objects in
memory.

You start with a python object – specifically a pytorch tensor – in memory,
and it is referred to by the python “variable” transformed_verts.

In your original, non-working version, you modify this specific pytorch
tensor (and pytorch rightly complains).

In your second, working version, you don’t modify the original pytorch
tensor. Instead you create a new pytorch tensor by calling torch.stack().
You then assign it to the python variable transformed_verts. But this
merely makes the variable transformed_verts refer to the new tensor.
The old – and, importantly, unmodified – tensor still remains in memory
and is referred to by some variable or another in autograd’s computation
graph.

Finally (I’m assuming that verts_tensor, and hence transformed_verts,
are batches of of vertices with shape [nBatch, 3].), you don’t need to use
a python loop over the batch elements to multiply the vertices by your
rotation matrix one by one – you can do it all at once (more efficiently)
with a single call to matmul() that broadcasts over the batch dimension.
But you need to use unsqueeze() (or similar reshaping) to convert
transformed_verts to a batch of “column vectors” so that matmul()
understands what you want to multiply.

(It’s also probably marginally more efficient to pre-multiply your x, y, and
z rotation matrices into a single rotation matrix and use that to rotate your
vertices.)

Here is an illustration:

>>> import torch
>>> torch.__version__
'1.10.2'
>>>
>>> _ = torch.manual_seed (2022)
>>>
>>> transformed_verts = torch.randn (5, 3)
>>>
>>> rotation_matrix_x = torch.randn (3, 3, requires_grad = True)
>>> rotation_matrix_y = torch.randn (3, 3, requires_grad = True)
>>> rotation_matrix_z = torch.randn (3, 3, requires_grad = True)
>>>
>>> # version that loops over batch dimension
...
>>> lines = []
>>>
>>> for i in range(transformed_verts.shape[0]):
...     res = torch.matmul(rotation_matrix_x, transformed_verts[i])
...     res = torch.matmul(rotation_matrix_y, res)
...     res = torch.matmul(rotation_matrix_z, res)
...
...     lines.append(res)
...
>>> transformed_vertsA = torch.stack(lines)
>>>
>>> transformed_vertsA.sum().backward()
>>>
>>> gradA = rotation_matrix_x.grad
>>>
>>> rotation_matrix_x.grad = None
>>> rotation_matrix_y.grad = None
>>> rotation_matrix_z.grad = None
>>>
>>> # version that calls matmul() once (with a single pre-multiplied rotation matrix)
...
>>> rotation_matrix = rotation_matrix_z @ rotation_matrix_y @ rotation_matrix_x
>>> transformed_vertsB = torch.matmul (rotation_matrix, transformed_verts.unsqueeze (2)).squeeze()
>>>
>>> transformed_vertsB.sum().backward()
>>>
>>> gradB = rotation_matrix_x.grad
>>>
>>> transformed_vertsA
tensor([[ -0.6688,   0.7012,  -0.2841],
        [ -3.5321,   4.6106,  -3.5176],
        [  7.7354, -11.2853,   6.8841],
        [  0.8983,  -0.3446,  -0.3980],
        [ -0.5089,   0.5077,  -0.4176]], grad_fn=<StackBackward0>)
>>> transformed_vertsB
tensor([[ -0.6688,   0.7012,  -0.2841],
        [ -3.5321,   4.6106,  -3.5176],
        [  7.7354, -11.2853,   6.8841],
        [  0.8983,  -0.3446,  -0.3980],
        [ -0.5089,   0.5077,  -0.4176]], grad_fn=<SqueezeBackward0>)
>>>
>>> gradA
tensor([[-3.6647,  1.2165,  0.3677],
        [-1.7096,  0.5675,  0.1715],
        [-4.3082,  1.4301,  0.4323]])
>>> gradB
tensor([[-3.6647,  1.2165,  0.3677],
        [-1.7096,  0.5675,  0.1715],
        [-4.3082,  1.4301,  0.4323]])

Best.

K. Frank