Hey, I apologize for reposting this question but I need the answer to this for a submission and it’s still unanswered
I am using pytorch’s QAT to compress ViT model. The model does shrink to a significant extent however the training slows down. A single without the QAT takes 149.9 secs and when trained with the same hyper-parameters (QAT on) takes 273 secs.
This shouldn’t happen as the only thing that changed was model becoming quantised. The memory shrinked 3.89 x which was expected. Can someone help me figure out why this is happening?
link to the original thread - Pytorch QAT quantisation slows down the training of ViT significantly