I’m wondering can we quantize any model in pytorch or there are some constraints on it?
If so then what are those constraints?
thanks in advance…
The constraints I can think of are:
- Op support, if some ops in the model don’t have quantized version, they need to be implemented or otherwise skipped during quantization.
- Accuracy constraint, quantization will introduce some errors to the output, if the quantized model has error larger than tolerance, you may need to skip the quantization of some operators in the model to get acceptable accuracy.