**TL;DR:** when I use AMP `GradScaler`

with two different losses (scaling each one separately), after about 100 epochs, training crashes due to NaN weights on backward.

I am trying to train a self-implemented DC-CDN, which uses two losses (Contrastive Depth Loss and Mean Squared Error).

In my implementation I’ve used `autocast`

for both the forward function and the losses’ computation (in particular, if it helps, I use autocast as an annotator for both of these functions, so as to make sure it is never enabled in another moment during training).

I’ve also used `GradScaler`

, initially summing both losses:

```
scaler.scale(loss1+loss2).backward()
scaler.step(opt)
scaler.update()
```

However, as I’ve learned in the AMP Recipe this fits an advanced use case, so I’ve changed the above code to this:

```
# each loss is scaled separately
scaler.scale(loss1).backward(retain_graph=True)
scaler.scale(loss2).backward()
scaler.step(opt)
scaler.update()
```

This was after reading this GitHub issue’s discussion.

Disabling `GradScaler`

or `autocast`

(just one, or both) has allowed me to finish my experiments without crashing, but it is my understanding that this could lead to future issues (disabling `autocast`

renders longer training times, which is not ideal, and no gradient scaling could correspond to `NaN`

weights in certain datasets).

Also, the fact that the `GradScaler`

step is not avoiding the `NaN`

weights as it is hints at there something being wrong with my implementation.