# Issues with gradient computed with torch.autograd

I have a rotation matrix of shape [B, 2, 3], where B is the batch size. This matrix is parameterized by the angle of rotation θ of shape [B ,1]. If I use torch.autograd to compute the gradient of this rotation matrix with respect to θ, I end up with a gradient of shape [B, 1]. Shouldn’t this gradient be of shape [B, 2, 3] instead (where we simply take gradients of each of the entries of the rotation matrix)? Here is a snippet of my code.

torch_pi = torch.acos(torch.zeros(1)).item() * 2
theta = 2 * torch_pi * torch.rand((batch_size, 1), requires_grad=True) - torch_pi

rot_mat = torch.zeros((batch_size, 2, 3))
mask1 = torch.zeros_like(rot_mat, dtype=torch.bool)
mask1[:, 0, 0] = True
mask2 = torch.zeros_like(rot_mat, dtype=torch.bool)
mask2[:, 0, 1] = True
mask3 = torch.zeros_like(rot_mat, dtype=torch.bool)
mask3[:, 1, 0] = True
mask4 = torch.zeros_like(rot_mat, dtype=torch.bool)
mask4[:, 1, 1] = True

#grad is of shape [B, 1] instead of [B, 2, 3]

I wrote another piece of code with for loops, but it is too slow. I don’t really want to use for-loops and I was wondering if it is possible to get the gradient matrix without using any loops.

for i in range(out.size(0)):
for j in range(out.size(1)):
outputs=out[i, j],
inputs=theta,
retain_graph=True
)[0]
)
grad_norm += (torch.sum(grad ** 2) + 1e-12)

Hi,

theta is of shape [B, 1] here right? So since it’s the input you ask gradient for, it’s not surprising it has the same size.
Keep in mind that autograd.grad does vector jacobian product. Where the vector is the grad_outputs that you provided.

Seeking a clarification here. If I need to compute the gradient of rotation matrix with respect to theta, should I swap inputs and outputs in torch.autograd?

PS:
I just tried differentiating a normal rotation matrix of shape [2, 3] w.r.t theta, and I end up with a single value again. If I swap inputs and outputs as I have listed above, I end up with an error One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior. Setting allowed unused = True returns a gradient None.

I think there is a confusion about what gradient/jacobian are:
You have a function fn: theta -> rot_mat. for simplicity here, we’ll assume both of these are 1D vectors of size B and 6B.
So the jacobian of this function is a 2D matrix of size [6B, B]. And what autograd computes is a vector product with this matrix from a vector of size 6B.
In your case, you gave a vector full of ones. and so what you end up with is the sum of all the rows of the Jacobian.
And you get an output of size B.

When you say gradient here, I am not sure what you’re talking about especially given that you expect its size to be 6B. The full jacobian would contain 6BB values.

Sorry, I didn’t explain it correctly. Yes, I need a full Jacobian with 6BB values. What is the most efficient way to compute it? I don’t want to use a double loop for all 6BB values and I assume that it can’t be computed directly done with a single backprop pass with torch.autograd. Ultimately, I want to compute L2 norm of this full Jacobian. If there’s a more efficient way to do this, let me know. I am still new to PyTorch and learning about it.

Hi,

Unfortunately, you will need a loop to get this. Where you provide a vector that contains only a single 1, to get the Jacobian row by row.
Note that we provide a function to do that nicely here: https://pytorch.org/docs/stable/autograd.html#torch.autograd.functional.jacobian

1 Like