PyTorch AMP matmul inf

Hi, I’m trying to use PyTorch AMP after knowing that it is now out. However, I’m getting NaN values in my model output. I traced this problem to the output of matmul having some infinite values. The inputs to the matmul operation are fine.

More particularly, I’m using the non-local block from here: Non-local_pytorch/ at master · AlexHex7/Non-local_pytorch · GitHub

How do I deal with this?


Does the same error happen in normal FP32 training?
If not, I think running the block in autocast(enabled=False) context is one choice. But use this context, it’ll be necessary to convert some input tensors to FP32.

autocast(enabled=False) subregions can be nested in autocast-enabled regions. Locally disabling autocast can be useful, for example, if you want to force a subregion to run in a particular dtype. Disabling autocast gives you explicit control over the execution type. In the subregion, inputs from the surrounding region should be cast to dtype before use: (quote from Automatic Mixed Precision package - torch.cuda.amp — PyTorch 1.8.0 documentation)


A possible solution is to scale-down the values one of the two matrices, or both of them before doing matmul operation, you can try something similar to what in Softmax-Attention.

for example you can try doing it this way:

f = torch.matmul(theta_x / math.sqrt(your_hidden_size), phi_x)

Ya, I shall try that and report back if I found anything. Thanks.