Mixed precision training and FP16 weights

Hello,

My objectives are:

  1. Train a network using mixed precision and then get weights as FP16 - I need a smaller model so that inference using Tensorrt can be optimized.

I know we can compile with FP16 weights using Torch-Tensorrt, but, with the recent releases of Torch-Tensorrt I have observed performance at FP16 to be unacceptable.
I also get warnings during compiling as - Subnormal FP16 values encountered.

Hence, my assumption is - if I train using mixed precision and also have FP16 weights to start with, hopefully tensorrt compiling with fp16 should work better.

Therefore, please advice how may I do that.

  1. Train a pytorch model using fake quanitze with INT8 weights and also have weights as INT8 to use with Torch-Tensorrt
    I am aware of this - Quantizing Resnet50 — pytorch-quantization master documentation

Unfortunately, the example ends at export to onnx and does not explain how to use with Tensorrt or Torch-Tensorrt.

Could you please elaborate on that?

Basically, I expect to get INT8 model weights after this fake-quantize training and wish to use these with Torch-Tensorrt for final deployment.

Please provide an example to show how.

Best Regards