I’ve been running into the sudden appearance of NaNs when I attempt to train using Adam and Half (float16) precision; my nets train just fine on half precision with SGD+nesterov momentum, and they train just fine with single precision (float32) and Adam, but switching them over to half seems to cause numerical instability. I’ve fiddled with the hyperparams a bit; upping epsilon helps a tiny bit but doesn’t fix the issue.
Is this something anyone else has info on? If not I can throw together a reproduction script and dig into the issue.
Thanks again! Been a good while since I’ve had to post on account of hitting no issues otherwise.