Pytorch QAT quantisation slows down the training of ViT significantly

puranjay_mishra · August 12, 2022, 6:53am

Hey. Thanks for the reply! I did use it and while I did observe some improvement with fused fake quantization, it’s still worse than the model without any quantisation. Shouldn’t there be an improvement in the computational time?
(I will share the ss of the CPU time of fake fused quantized soon)

jerryzh168 · August 13, 2022, 2:41am

currently BackendConfig is not supported by eager mode quantization. it will take a long time for eager mode quantization to support this I think. I would recommend to use fx instead

jerryzh168 · August 13, 2022, 2:42am

no we don’t expect to see improvement in training time, since we are adding additional operations (fake quantize ops) to the model, but we do expect to see improvement for the converted model, which is a real quantized model.

puranjay_mishra · August 13, 2022, 3:35am

Okay. So we are supposed to improvement in the converted model, during evaluation? I would really appreciate if you help me with another question, does increase in training time happen in other quantisation schemes too?

jerryzh168 · August 13, 2022, 5:17am

Okay. So we are supposed to improvement in the converted model, during evaluation?

Yes

I would really appreciate if you help me with another question, does increase in training time happen in other quantisation schemes too?

Yes, since we are fake quantizing some activation/weight Tensors in addition to running the original fp32 ops during quantization aware training.

aimen123 · August 14, 2022, 3:35am

i know, thanks, but i what to change bias type (to qint8)

aimen123 · August 14, 2022, 8:13am

I have a another question, i need to set scale to 1/2、1/4 、1/8 … 2^-n , where I need to change?
i can ouput the quantization float32 ? the data is the int , for example:
1660464744946

puranjay_mishra · August 14, 2022, 10:43am

Okay. Thanks a lot! Really appreciate the help.

aimen123 · August 16, 2022, 7:10am

do you have a good idea?

jerryzh168 · August 25, 2022, 4:45pm

currently it’s only supported in fx graph mode quantization, are you able to use that? (prototype) FX Graph Mode Post Training Static Quantization — PyTorch Tutorials 1.12.1+cu102 documentation

jerryzh168 · August 25, 2022, 4:46pm

you would need to write a new observer or fake quantize module and customize calculate qparams: https://github.com/pytorch/pytorch/blob/master/torch/ao/quantization/observer.py#L294

actually we have a recent intern project that implements additive power of two quantization method, maybe you can take a look as well: https://github.com/pytorch/pytorch/tree/master/torch/ao/quantization/experimental

aimen123 · September 27, 2022, 3:58am

hai，lstm can be quantization? eager or fx?

jerryzh168 · September 27, 2022, 9:10pm

we have support for eager mode quantization for lstm, through our custom module api, and recently @andrewor just added support for fx graph mode quant as well.

test for eager: pytorch/test_quantized_op.py at master · pytorch/pytorch · GitHub
test for fx: pytorch/test_quantize_fx.py at master · pytorch/pytorch · GitHub

jerryzh168 · September 27, 2022, 9:10pm

also please open a new post for a new question, instead of relying to an unrelated post, so that other people can find it as well

aimen123 · September 29, 2022, 7:54am

ok , thank you ， i just have a last question, in the coco dataset, the acc of QAT is good? maybe loss is big

jerryzh168 · September 30, 2022, 1:24am

we don’t have the numbers, I think it will depend on how you do QAT, but typically it should work reasonable well for vision models