I have a rotation matrix of shape [B, 2, 3], where B is the batch size. This matrix is parameterized by the angle of rotation θ of shape [B ,1]. If I use torch.autograd to compute the gradient of this rotation matrix with respect to θ, I end up with a gradient of shape [B, 1]. Shouldn’t this gradient be of shape [B, 2, 3] instead (where we simply take gradients of each of the entries of the rotation matrix)? Here is a snippet of my code.
torch_pi = torch.acos(torch.zeros(1)).item() * 2
theta = 2 * torch_pi * torch.rand((batch_size, 1), requires_grad=True) - torch_pi
rot_mat = torch.zeros((batch_size, 2, 3))
mask1 = torch.zeros_like(rot_mat, dtype=torch.bool)
mask1[:, 0, 0] = True
mask2 = torch.zeros_like(rot_mat, dtype=torch.bool)
mask2[:, 0, 1] = True
mask3 = torch.zeros_like(rot_mat, dtype=torch.bool)
mask3[:, 1, 0] = True
mask4 = torch.zeros_like(rot_mat, dtype=torch.bool)
mask4[:, 1, 1] = True
rot_mat.masked_scatter_(mask1, theta.cos())
rot_mat.masked_scatter_(mask2, -theta.sin())
rot_mat.masked_scatter_(mask3, theta.sin())
rot_mat.masked_scatter_(mask4, theta.cos())
grad = torch.autograd.grad(outputs=rot_mat, inputs=theta, grad_outputs=torch.ones_like(rot_mat), create_graph=True)
#grad is of shape [B, 1] instead of [B, 2, 3]
I wrote another piece of code with for loops, but it is too slow. I don’t really want to use for-loops and I was wondering if it is possible to get the gradient matrix without using any loops.
grad_norm = 0
for i in range(out.size(0)):
grad = []
for j in range(out.size(1)):
grad.append(
torch.autograd.grad(
outputs=out[i, j],
inputs=theta,
grad_outputs=torch.ones_like(out[i, j]),
retain_graph=True
)[0]
)
grad = torch.cat(grad)
grad_norm += (torch.sum(grad ** 2) + 1e-12)