Torch_tensorrt vs AMP

sardanian · January 13, 2023, 2:14pm

What are the differences of converting a model to tensorrt via torch_tensorrt vs using PyTorch AMP for inference?

I’m using precisions of float and half (not int8) on a convolution and skip connections.

Does the model come out with that same graph in both models? Or does one reduce elements

Is one faster?
Is one more accurate?
Other advantages or disadvantages?

sardanian · January 17, 2023, 6:08pm

Does anybody have experience with the differences of these?

eqy · January 17, 2023, 11:42pm

The difference would ultimately depend on what your model is doing, but in general AMP would be much more limited in what optimizations it can apply to your model for inference compared to TensorRT which in theory can apply arbitrary fusions and use specialized kernels for inference. For example, AMP in eager mode won’t opportunistically fuse successive pointwise operations, and would insert potentially unnecessary up/down-casts for successive pointwise operations. Additionally, you may see greater CPU overhead from AMP/eager-mode dispatching compared to TensorRT but the severity again depends on what your model is doing (e.g., ratio of compute/arithmetic-bound ops vs. bandwidth-bound ops) and the coverage of TensorRT optimizations for your use-case.