The difference would ultimately depend on what your model is doing, but in general AMP would be much more limited in what optimizations it can apply to your model for inference compared to TensorRT which in theory can apply arbitrary fusions and use specialized kernels for inference. For example, AMP in eager mode won’t opportunistically fuse successive pointwise operations, and would insert potentially unnecessary up/down-casts for successive pointwise operations. Additionally, you may see greater CPU overhead from AMP/eager-mode dispatching compared to TensorRT but the severity again depends on what your model is doing (e.g., ratio of compute/arithmetic-bound ops vs. bandwidth-bound ops) and the coverage of TensorRT optimizations for your use-case.