Error when performing autograd gradcheck twice on torchscript

Consider the following code, which I have extracted as a minimal example from what is actually a more complicated scenario:

import torch

@torch.jit.script
def func_jit(x):
	return x.mul(x.tanh())

print(torch.__version__)
X = torch.tensor(1.23, requires_grad=True, dtype=torch.double)
print(torch.autograd.gradcheck(func_jit, X, raise_exception=True, check_undefined_grad=True, check_batched_grad=True, check_backward_ad=True))
X = torch.tensor(1.23, requires_grad=True, dtype=torch.double)
print(torch.autograd.gradcheck(func_jit, X, raise_exception=True, check_undefined_grad=True, check_batched_grad=True, check_backward_ad=True))

This gives the following output:

1.12.1
True
Traceback (most recent call last):
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 839, in _test_batched_grad
    result = vmap(vjp)(torch.stack(grad_outputs))
  File "PATH/lib/python3.9/site-packages/torch/_vmap_internals.py", line 271, in wrapped
    batched_outputs = func(*batched_inputs)
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 822, in vjp
    results = grad(v)
  File "PATH/lib/python3.9/site-packages/torch/autograd/__init__.py", line 276, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Cannot access data pointer of Tensor that doesn't have storage


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "PATH/test_act_funcs.py", line 86, in <module>
    print(torch.autograd.gradcheck(func_jit, X, raise_exception=True, check_undefined_grad=True, check_batched_grad=True, check_backward_ad=True))
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1414, in gradcheck
    return _gradcheck_helper(**args)
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1442, in _gradcheck_helper
    _test_batched_grad(tupled_inputs, o, i)
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 845, in _test_batched_grad
    raise GradcheckError(
torch.autograd.gradcheck.GradcheckError: While computing batched gradients, got: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Cannot access data pointer of Tensor that doesn't have storage


gradcheck or gradgradcheck failed while testing batched gradient computation.
This could have been invoked in a number of ways (via a test that calls
gradcheck/gradgradcheck directly or via an autogenerated test).

If you are adding a new operator, please file an issue and then use one of the
workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
If the test
- manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
  with `check_batched_grad=False` as a keyword argument.
- is OpInfo-based (e.g., in test_ops_gradients.py), then modify the OpInfo for the test
  to have `check_batched_grad=False` and/or `check_batched_gradgrad=False`.

If you're modifying an existing operator that supports batched grad computation,
or wish to make a new operator work with batched grad computation, please read
the following.

To compute batched grads (e.g., jacobians, hessians), we vmap over the backward
computation. The most common failure case is if there is a 'vmap-incompatible
operation' in the backward pass. Please see
NOTE: [How to write vmap-compatible backward formulas]
in the codebase for an explanation of how to fix this.

This error goes away if func_jit becomes either x.mul(x) or x.tanh(), but is also present for x.mul(x.mul(x)). It seems that having more than a single operation causes the error? What is going on here?

Setting check_batched_grad=False ‘resolves’ the issue. Is batched grad fundamentally incompatible with torch.jit.script? Note however that with check_batched_grad=True the first call to torch.autograd.gradcheck actually succeeded without problems?

I think you need to add warmup iterations to allow the JIT to optimize the code.
Right now you are comparing the scripted version in different staged to the eager mode output.
This should work:

@torch.jit.script
def func_jit(x):
	return x.mul(x.tanh())

print(torch.__version__)
X = torch.tensor(1.23, requires_grad=True, dtype=torch.double, device='cuda')

for _ in range(3):
    out = func_jit(x)

print(torch.autograd.gradcheck(func_jit, X, raise_exception=True, check_undefined_grad=True, check_batched_grad=True, check_backward_ad=True))
print(torch.autograd.gradcheck(func_jit, X, raise_exception=True, check_undefined_grad=True, check_batched_grad=True, check_backward_ad=True))

Is there a guaranteed number of warmup iterations that I can always use that will allow the JIT to optimize the code? i.e. is 3 arbitrary here, or documented somewhere?

When I run your code (typo: the x inside the for-loop should be upper case) I now get:

1.12.1
Traceback (most recent call last):
  File "PATH/test_act_funcs.py", line 120, in <module>
    print(torch.autograd.gradcheck(func_jit, X, raise_exception=True, check_undefined_grad=True, check_batched_grad=True, check_backward_ad=True))
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1414, in gradcheck
    return _gradcheck_helper(**args)
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1428, in _gradcheck_helper
    _gradcheck_real_imag(gradcheck_fn, func, func_out, tupled_inputs, outputs, eps,
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1075, in _gradcheck_real_imag
    gradcheck_fn(func, func_out, tupled_inputs, outputs, eps,
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 1124, in _slow_gradcheck
    analytical = _check_analytical_jacobian_attributes(tupled_inputs, o, nondet_tol, check_grad_dtypes)
  File "PATH/lib/python3.9/site-packages/torch/autograd/gradcheck.py", line 549, in _check_analytical_jacobian_attributes
    raise GradcheckError('Backward is not reentrant, i.e., running backward with '
torch.autograd.gradcheck.GradcheckError: Backward is not reentrant, i.e., running backward with same input and grad_output multiple times gives different values, although analytical gradient matches numerical gradient.The tolerance for nondeterminism was 0.0.

NOTE: If your op relies on non-deterministic operations i.e., it is listed here:
https://pytorch.org/docs/stable/generated/torch.use_deterministic_algorithms.html
this failure might be expected.

If you are adding a new operator, please file an issue and then use one of the
workarounds. The workaround depends on how your test invokes gradcheck/gradgradcheck.
If the test
- manually invokes gradcheck/gradgradcheck, then call gradcheck/gradgradcheck
  with `nondet_tol=<tol>` as a keyword argument.
- is OpInfo-based (e.g., in test_ops_gradients.py), then modify the OpInfo for the test
  to have `gradcheck_nondet_tol=<tol>`.
- is a Module test (e.g., in common_nn.py), then modify the corresponding
  module_test entry to have `gradcheck_nondet_tol=<tol>`

I don’t believe there are any non-deterministic operations in use (I checked, and e.g. running without the torch.jit.script decorator works fine), so is this a sign that the ‘backward JIT code’ also needs a warmup or so? Or does torchscript generally produce non-deterministic backward passes?

If I change there to be three autograd calls at the end, the first with raise_exception=False, but then the next two with True, then I get:

1.12.1
False
PATH/lib/python3.9/site-packages/torch/autograd/__init__.py:276: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
 (Triggered internally at  /opt/conda/conda-bld/pytorch_1659484775609/work/torch/csrc/jit/codegen/cuda/manager.cpp:334.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
True
True

Hmm, what is going on here?

More info:
If I set export PYTORCH_NVFUSER_DISABLE=fallback like it suggests then I see this again:

...
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: Cannot access data pointer of Tensor that doesn't have storage
...