I am working on the QAT training for my model.
I compared the two training cases, i.e., training an FP32 model vs QAT based on FP32 mdoel.
What I observed is the time for each epoch during training is similar.
But when comparing the loss decreasing, I found the QAT is extremely slow.
For example to achieve 1.5 (just an example) from 5.0 the FP32 training just needs 50 epoch. But for the QAT from 5.0 to 3.5, it has taken 6k epoch, and seems the loss decreasing is getting slower and slower.
BTW, all the learning rate, optimizer are the same for these two training.
Is there expected?