Why can't backward() calculate whole Jacobians directly?


I’m wondering - why backward() can’t be calculated for vectors (Therefore calculating whole Jacobian instead of Jacobian-vector products)?

Interface would be straightforward - just adding another dimension to tensor, so I guess it must be for efficiency reasons, but I’m curious what exactly happens here.

It’s plainly not implemented and currently not considered in scope (other frameworks such as JAX seem to provide more general derivatives). For some operations you can trick autograd using broadcasting. There is some related discussion in an old issue and less related in the forum thread linked there. Another way might be to implement it yourself based on Torch IR graphs or to implement dual numbers in tensors or so.

Best regards