Distributed training gives nan loss but single GPU training is fine

royve · March 17, 2022, 11:41pm

I ran into the exact same problem.
Any chance that you have eventually found what was the problem?

Thanks!