Can I perform inference with a quantized model on the GPU in PyTorch 1.6?

lcrew11 · October 22, 2020, 11:38am

Can I perform inference on the GPU with a quantized model? Articles on the PyTorch website mention that 'PyTorch 1.3 doesn’t provide quantized operator implementations on CUDA yet ’ (https://pytorch.org/docs/stable/quantization.html) but there is no clarification if any additional GPU support has been added to PyTorch 1.6, or whether it is planned in any future releases.

supriyar · October 27, 2020, 11:30pm

We don’t support quantized model inference on GPU currently. The docs have been updated to reflect that.