I think there is a confusion about what gradient/jacobian are:
You have a function fn: theta -> rot_mat. for simplicity here, we’ll assume both of these are 1D vectors of size B and 6B.
So the jacobian of this function is a 2D matrix of size [6B, B]
. And what autograd computes is a vector product with this matrix from a vector of size 6B.
In your case, you gave a vector full of ones. and so what you end up with is the sum of all the rows of the Jacobian.
And you get an output of size B.
When you say gradient here, I am not sure what you’re talking about especially given that you expect its size to be 6B. The full jacobian would contain 6BB values.