Convert_syncbn_model causes gradient overflow with apex mixed precision

I measure the performance following the post bellow. Instead of while I have the forward, backward, optimizer step and zeroing the parameter gradients, then I print the time each iteration takes.

I can confirm that the training did not face any problems yet, apart from the slow training :confused:.
Do you think the cuda version might be causing this? Is there any verbose I can enable for debugging, or if it can print something like apex when there is gradient overflow and is adjusting the scale loss?

Thanks for the help :slight_smile:.