I am trying to get a sense for the level of support in TorchScript for back propagation graphs produced by using autograd. If anyone can provide a quick summary, and/or pointers to how one can experiment with this - it would be much appreciated.
TorchScript has full support for PyTorch’s tape-based autograd. You can call backward() on your tensors if you are recording gradients and it should work.
Thanks for the prompt response. I am interested in tracing through the backward graph using TorchScript and dumping the IR for the autodiff-ed backdrop graph, for full graph optimization in a separate framework. To be precise - on the example of a backward op for a matmul, I’d expect to get the appropriately transposed matmul relevant to the backward pass in the dumped IR graph. Would you expect this to be possible?
Ah, we do not have a public API for exposing a static backward graph, as PyTorch relies on dynamic autograd for doing automatic differentiation. We do have an internal API for symbolic differentiation (see torch/csrc/jit/runtime/autodiff.cpp which you can play with, but it is not complete and we don’t have any stability guarantees about it
Hi Michael,
My understanding is that torchscript is converted to SSA IR and then executed. How does the autograd work on the SSA IR?
DO you mean autodiff.cpp isn;t enabled for torchscript now?
Curious if you could elaborate on how “dynamic” plays into this — if a compiled TorchScript model has been through profile-guided optimization and had all of the control flow stripped out, the actual autograd graph structure should be the same at each inference pass, yes?
When I run autograd with create_graph = True, is a graph being created that PyTorch knows how to execute, or only how to differentiate further?
The motivation for these questions is models whose inference pass involves a gradient of their initial prediction, and the hope that it might be possible to save/cache the compute graph representing their gradient as a model in its own right.