Pytorch QAT quantisation slows down the training of ViT significantly (reposting the question)

Hey, I apologize for reposting this question but I need the answer to this for a submission and it’s still unanswered :frowning_face:
I am using pytorch’s QAT to compress ViT model. The model does shrink to a significant extent however the training slows down. A single without the QAT takes 149.9 secs and when trained with the same hyper-parameters (QAT on) takes 273 secs.
This shouldn’t happen as the only thing that changed was model becoming quantised. The memory shrinked 3.89 x which was expected. Can someone help me figure out why this is happening?

link to the original thread - Pytorch QAT quantisation slows down the training of ViT significantly

why not just ask in the original thread? seems there’s still an active discussion in there

Hey. Yes, the thread is active now. It did go cold turkey for a bit and I’m in a rush to figure this out :sweat_smile: