Is there a way to do quantization (mostly 8-bit) on GPUs in native pytorch while avoding TensorRT?
The docs seem to indicate to me that quantization for GPUs is possible only with TensorRT. Is that correct? If not available in main, Is there maybe a PR to work with?