I am using NVIDIA Apex package for speeding up training of my CNN model, I compare the performance between using traditional Adam algorithm and using Apex O1 optimization technique with the following code:
The training process is speed up 3-4 times visibly compared with traditional Adam algorithm. But when I check the training model, I find it turns out performance of model trained with Apex in testing sets is degraded compared with just using Adam. Are there any solutions? Because I want to speed up training process and obtain good performances on testing datasets.
Hi, @ptrblck. I have switched to torch.amp utils, but I find out another problem. Also iterating 500 epoches, using torch.utils is more prone to overfitting than the direct Adam-only method, how can I solve this problem?
We haven’t seen any overfitting issues using amp and a few examples are given e.g. in this blog post. It would be great if you could provide more information about the expected target accuracy and the the mean+/-stddev of the achieved accuracy when comparing different approaches.