I have a big composite module, that jit compiles and executes forward() ok, but fails in backward(). The big issue is that there is no error with jit disabled, and set_detect_anomaly() is not too helpful in jit mode.
I’m 90% sure the error itself is related to how jit incorrectly enables requires_grad in multiple scenarios. But are there any techniques to localize it?
For reference, here is exception text:
The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
File "<string>", line 138, in <backward op>
dim: int):
def backward(grad_outputs: List[Tensor]):
grad_self = torch.stack(grad_outputs, dim)
~~~~~~~~~~~ <--- HERE
return grad_self, None
RuntimeError: sizes() called on undefined Tensor
So, some generated code, look like for unbind() operation, where outputs have inconsistent requires_grad?
Also note how “dim:int” line cutoff does a bad service.
And console:
[W …\torch\csrc\autograd\python_anomaly_mode.cpp:104] Warning: Error detected in struct torch::jit::`anonymous namespace’::DifferentiableGraphBackward. Traceback of forward call that caused the error:
…
has traceback that stops at jit module