Quanitization for GPU in native pytorch

fabian_schutze · January 4, 2023, 5:38pm

Is there a way to do quantization (mostly 8-bit) on GPUs in native pytorch while avoding TensorRT?

The docs seem to indicate to me that quantization for GPUs is possible only with TensorRT. Is that correct? If not available in main, Is there maybe a PR to work with?

I am grateful for any hints or suggestions.

Vasiliy_Kuznetsov · January 13, 2023, 4:06pm

Hi @fabian_schutze , we are considering this for future work but we don’t currently have this in a usable form.

jerryzh168 · January 26, 2023, 8:42pm

yeah, we don’t have a usable support for native quantized GPU ops, here are some discussions in github as well: Quantized Inference on GPU summary of resources · Issue #87395 · pytorch/pytorch · GitHub

fabian_schutze · January 27, 2023, 8:36am

Thanks a lot for your replies, @Vasiliy_Kuznetsov and @jerryzh168 . They were both very informative.