Hi,I have some questions about quantization and deployment:
First,I’d like to ask, is the JIT model exported after FX quantization in PyTorch considered a deployment model or a quantized model? Has any relevant processing been done for the backend platform?
Also, I’ve seen on GitHub that some people directly redefine the QConv and QLinear operators to replace the Conv and Linear operators in the model structure. Is it not feasible to use this method in actual deployment?
Lastly,How can I export a quantized model in PyTorch to other backend platforms, such as TensorRT, OpenVINO, and ONNX Runtime? It seems that PyTorch does not support exporting quantized models to ONNX.