Getting Nans from dropout layer

dake · August 1, 2023, 11:46am

Hi, I also suffered from the same problem and I found the reason.

Dropout layer doesn’t cause NaN values.

Is there any suggestion which solve numerical unstability of softmax layer?

I’m implementing mixed precision training and I found that softmax layer is not stable with fp16 input.

Thanks!