I‘m now trying use pytorch for quantization. According to the documentation,there are three types, dynamic quantization,static quantization and static quantization aware training.
And i have some questions related to the GPU and CPU, we know that pytorch doesn’t provide quantized operator implementation on CUDA, and quantization aware training occurs in full floating point can support CUDA, it is still not clear enough for me, i need to use the model to do prediction on C++ by using libtorch, once I do quantization aware training, can the model run on GPU?
May someone answer my question?
thanks a lot and best wishes!