Issue with Per-Sample Gradients tutorial

sandy_floren · June 12, 2024, 5:39pm

I’m interested in computing gradients of my model outputs w.r.t. its inputs more efficiently, and I came across this tutorial. Unfortunately, when testing out the code exactly as it is written in the tutorial, I get an AssertionError when checking that the gradients computed with function transforms are close to the naive computation. This is the code that fails:

for per_sample_grad, ft_per_sample_grad in zip(per_sample_grads, ft_per_sample_grads.values()):
    assert torch.allclose(per_sample_grad, ft_per_sample_grad, atol=3e-3, rtol=1e-5)

It seems like an issue to me that code in an official PyTorch tutorial would fail like this.

I checked manually and I get differences between the two gradients as large as 0.0120.

I’m using Python 3.11.6 and CUDA 12.3 on an NVIDIA A30 GPU.

soulitzer · June 12, 2024, 6:17pm

I couldn’t reproduce this on the main branch. What version of PyTorch are you using?

sandy_floren · June 12, 2024, 6:32pm

Name                      Version            Build
pytorch                    2.2.2         py3.11_cuda12.1_cudnn8.9.2_0

soulitzer · June 12, 2024, 6:38pm

Are you able to update to a later version of PyTorch? It also does not fail on Colab (2.3.0)

sandy_floren · June 12, 2024, 7:49pm

Still getting the error with PyTorch 2.3.0.

soulitzer · June 12, 2024, 8:14pm

Hmm maybe an issue specific to your GPU version. cc @ptrblck

ptrblck · June 13, 2024, 12:29am

I would disable TF32 if FP32 thresholds are used as described here.

sandy_floren · June 24, 2024, 6:16pm

Thanks, this ended up being the issue.