Hey all. I’ve taken a look at quantization recently for my final university project. I’ve seen that apparently PyTorch support at most 8-bit quantization. But is there any way to quantize my neural network to a lower precision (e.g. 4-bit or 2-bit)? Is it impossible instead? Please respond me.
it’s not supported yet, responded here: Quantization aware training lower than 8-bits?