My objectives are:
- Train a network using mixed precision and then get weights as FP16 - I need a smaller model so that inference using Tensorrt can be optimized.
I know we can compile with FP16 weights using Torch-Tensorrt, but, with the recent releases of Torch-Tensorrt I have observed performance at FP16 to be unacceptable.
I also get warnings during compiling as - Subnormal FP16 values encountered.
Hence, my assumption is - if I train using mixed precision and also have FP16 weights to start with, hopefully tensorrt compiling with fp16 should work better.
Therefore, please advice how may I do that.
Train a pytorch model using fake quanitze with INT8 weights and also have weights as INT8 to use with Torch-Tensorrt
I am aware of this - Quantizing Resnet50 — pytorch-quantization master documentation
Unfortunately, the example ends at export to onnx and does not explain how to use with Tensorrt or Torch-Tensorrt.
Could you please elaborate on that?
Basically, I expect to get INT8 model weights after this fake-quantize training and wish to use these with Torch-Tensorrt for final deployment.
Please provide an example to show how.