I have a big composite module, that jit compiles and executes forward() ok, but fails in backward(). The big issue is that there is no error with jit disabled, and set_detect_anomaly() is not too helpful in jit mode.
I’m 90% sure the error itself is related to how jit incorrectly enables requires_grad in multiple scenarios. But are there any techniques to localize it?
For reference, here is exception text:
The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): File "<string>", line 138, in <backward op> dim: int): def backward(grad_outputs: List[Tensor]): grad_self = torch.stack(grad_outputs, dim) ~~~~~~~~~~~ <--- HERE return grad_self, None RuntimeError: sizes() called on undefined Tensor
So, some generated code, look like for unbind() operation, where outputs have inconsistent requires_grad?
Also note how “dim:int” line cutoff does a bad service.
[W …\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in struct torch::jit::`anonymous namespace’::DifferentiableGraphBackward. Traceback of forward call that caused the error:
has traceback that stops at jit module