I’m interested in computing gradients of my model outputs w.r.t. its inputs more efficiently, and I came across this tutorial. Unfortunately, when testing out the code exactly as it is written in the tutorial, I get an AssertionError
when checking that the gradients computed with function transforms are close to the naive computation. This is the code that fails:
for per_sample_grad, ft_per_sample_grad in zip(per_sample_grads, ft_per_sample_grads.values()):
assert torch.allclose(per_sample_grad, ft_per_sample_grad, atol=3e-3, rtol=1e-5)
It seems like an issue to me that code in an official PyTorch tutorial would fail like this.
I checked manually and I get differences between the two gradients as large as 0.0120.
I’m using Python 3.11.6 and CUDA 12.3 on an NVIDIA A30 GPU.