Can I perform inference with a quantized model on the GPU in PyTorch 1.6?

Can I perform inference on the GPU with a quantized model? Articles on the PyTorch website mention that 'PyTorch 1.3 doesn’t provide quantized operator implementations on CUDA yet ’ (https://pytorch.org/docs/stable/quantization.html) but there is no clarification if any additional GPU support has been added to PyTorch 1.6, or whether it is planned in any future releases.

We don’t support quantized model inference on GPU currently. The docs have been updated to reflect that.