I’m training a model that uses a single atan2 in its final stage. I’ve enabled anomaly detection with torch.autograd.set_detect_anomaly(True), and after a while I get an error that “[f]unction ‘Atan2Backward0’ returned nan values in its 0th output.” I know that atan2 can produce nan if both the nominator and denominator are 0, so I add a small epsilon value to any denominator values that are zero:

# Calculate the phase of a complex spectrogram
def find_phase(self, subbands: torch.Tensor) -> torch.Tensor:
numerator = subbands[:, 1:2, :, :]
denominator = subbands[:, 0:1, :, :]
# Add epsilon to denominator to avoid nan
epsilon = 1e-7
nudge = (denominator == 0) * epsilon
denominator = denominator + nudge
return torch.atan2(numerator, denominator)

Strangely, though, the error still occurs. I wonder if I’m missing something here, and if there’s a better way to ensure that atan2 doesn’t return nan. Thanks!

Also, I’m training on a Nvidia A100 with PyTorch 2.0.0 + CUDA 11.7, if that’s any help.

So I iterated over all of the model parameters after calling loss.backward() and checked for NaN in each of the gradients and weights.

# Check for non-finite parameters
all_gradients_are_finite = True
for parameter_name, parameter in self.model.named_parameters():
if parameter.grad is not None:
if not parameter.grad.isfinite().all():
print("Parameter gradient is non-finite: {}".format(parameter_name))
all_gradients_are_finite = False
elif parameter.data is not None:
if not parameter.data.isfinite().all():
print("Parameter data is non-finite: {}".format(parameter_name))

None of the weights contain NaN (probably because my training loop avoids calling optimizer.step() if any values are NaN) but all of the gradients do. The first layer with a NaN gradient in the backward pass is the last trainable layer of the model (Conv2d). This makes sense, I suppose, because the atan2 call in question appears after that final layer: it’s part of an output stage where I convert the estimated spectrogram back to a waveform. So if atan2 returns NaN in the backward pass it would propagate to the whole model.

I checked the inputs to the find_phase method and they don’t contain NaN at all during the forward pass. The loss doesn’t contain NaN either (as long as I don’t call optimizer.step() when NaN gradients are detected).

I also tried removing the find_phase method and the NaNs disappeared. So tan2 does seem to be the culprit. I’m a little ignorant about how autograd works, to be honest, but I wonder if adding epsilon here only guarantees a finite output from atan2 during the forward pass, and it’s still possible for the backward pass to feed it a zero-valued numerator and denominator?

I wasn’t sure exactly where to add this check, but I tried directly after atan2 in the find_phase method as well as at the output of the model itself. Both checks did return zero outputs now and then, although this is to be expected since the model ultimately returns a waveform, and find_phase returns the phase of a complex spectrogram, both of which may contain zero crossings, silent sections, etc. Could zero outputs cause atan2 to return NaN in the backward pass, I wonder?

but note that both inputs are also set to zero.
Given you are adding a small eps value to the denominator this should not happen (as seen in my previous code), but still something is causing the issue which is why I suggested to now dig a bit into the actual values.

I figured it out! By digging into the actual values, as you suggested. My find_phase method above replaces zero values with eps, yes, but other values that are very close to zero (e.g. 1e-20) slip through unchanged, and become zero during the backward pass via rounding errors. The solution:

Anyway, all fixed now. Thanks again for pointing me in the right direction! Been scratching my head for hours over this one and very relieved to see it resolved.