RuntimeError: value cannot be converted to type at::Half without overflow: -1e+30

Hi,
I am implementing my training with mixed-precision operations with torch.cuda.amp.autocast(enabled=True) and I have some trouble with a masking operation (for the transformer model). What I need to do is to mask all the 0’s before the softmax computation. In order to do so, I use a very negative value for the mask (that ensures the softmax to output 0 attention for those logits). Unfortunately, with mixed precision this operation causes overflow. The operation is the following:

_MASKING_VALUE=-1e30
masked_attn_logits = attn_logits.masked_fill(attn_mask==0, value=_MASKING_VALUE)

I already tried to reduce the masking value to -1e6 but without success. Can you help me?

solution:
_MASKING_VALUE = -1e+30 if attn_logits.dtype == torch.float32 else -1e+4