TorchScript for the backward (autograd) graph

ljubisa · May 18, 2020, 6:41pm

Hi,

I am trying to get a sense for the level of support in TorchScript for back propagation graphs produced by using autograd. If anyone can provide a quick summary, and/or pointers to how one can experiment with this - it would be much appreciated.

Ljubisa

Michael_Suo · May 18, 2020, 7:25pm

TorchScript has full support for PyTorch’s tape-based autograd. You can call backward() on your tensors if you are recording gradients and it should work.

ljubisa · May 19, 2020, 1:42pm

Hi Michael,

Thanks for the prompt response. I am interested in tracing through the backward graph using TorchScript and dumping the IR for the autodiff-ed backdrop graph, for full graph optimization in a separate framework. To be precise - on the example of a backward op for a matmul, I’d expect to get the appropriately transposed matmul relevant to the backward pass in the dumped IR graph. Would you expect this to be possible?

Ljubisa

Michael_Suo · May 21, 2020, 2:53am

Ah, we do not have a public API for exposing a static backward graph, as PyTorch relies on dynamic autograd for doing automatic differentiation. We do have an internal API for symbolic differentiation (see torch/csrc/jit/runtime/autodiff.cpp which you can play with, but it is not complete and we don’t have any stability guarantees about it

valuenumbering · October 8, 2020, 7:19am

Hi Michael,
My understanding is that torchscript is converted to SSA IR and then executed. How does the autograd work on the SSA IR?
DO you mean autodiff.cpp isn;t enabled for torchscript now?

Linux-cpp-lisp · March 31, 2021, 3:48am

Hi @Michael_Suo,

Curious if you could elaborate on how “dynamic” plays into this — if a compiled TorchScript model has been through profile-guided optimization and had all of the control flow stripped out, the actual autograd graph structure should be the same at each inference pass, yes?

When I run autograd with create_graph = True, is a graph being created that PyTorch knows how to execute, or only how to differentiate further?

The motivation for these questions is models whose inference pass involves a gradient of their initial prediction, and the hope that it might be possible to save/cache the compute graph representing their gradient as a model in its own right.

Thanks for your help!