Nan Loss with torch.cuda.amp and CrossEntropyLoss

You could check with forward hooks where the first invalid output is created to narrow down the issue.
Here is an example of using forward hooks to get the intermediate activations.

I would start by checking the model output first. If it’s valid, this would point to the custom loss function, which might create e.g. overflows.