Would `torch.amp` cause a slower convergence?

Hi, guys,
I want to know whether torch.amp would cause a slower convergence. I found after using torch.amp, my custom model seems to converge slower than training it without torch.amp.

Your answer and guide will be appreciated!

No, this shouldn’t be the case as seen e.g. in these loss curves for RN50.

1 Like

Hi, @ptrblck, I noticed, at the beginning of the the curve, the loss convergence with FP32 seems to be a little faster than Mixed Precision, while they reach similar accuracy eventually.

That sounds reasonable assuming you are not seeing a large divergence.
Using amp will not create bitwise-identical results to the float32 run, so the loss curves will see a bit of jitter during the training and will not map perfectly.

1 Like

Thank you sincerely.