Quantization aware training lower than 8-bits?

Hello. I am not an expert of PyTorch, however I need to quantize my model to less than 8 bits (e.g. 4-bits, 2-bits etc.). I’ve seen that PyTorch actually does not officially support this “aggressive” quantization. Is there any way to do this? I’m asking you if there is some sort of documentation with steps to follow (or something like that) because as I’ve said I’m not an expert. Plus, I don’t need to only evaluate the accuracy of the quantized model, but also compress the model in a way that I will be able to deploy it on a controller or a mobile. Thanks in advance to whoever tries to help me.

@supriyar added quint4x2 datatype to quantization, some tests can be found in pytorch/test_quantize_fx.py at master · pytorch/pytorch · GitHub, but we do have have int4 datatype support for any ops except embedding_bag.

We are working on a design doc to support custom backends currently and I think it will be useful to support this as well, please keep an eye out for design docs in github issues, I may update here if I still remember this post.

1 Like

its 2021-11-03 now
any update for now?

what dtype do you need? we only have quint4x2 currently but contributions are welcome. We don’t have other use cases right now so it might not make much sense for us to add other types currently.