What are the best practices to do before moving the model to TensorRT?

I have a PyTorch model and I want to deploy it in TensorRT. I want to know what are the best practices to carry out before moving the model to ONNX (should I train it in fp32 or fp16 using AMP), what are the best practices to do while the model is in ONNX(use anything from the ONNX optimization toolkit), and what are the best practices to carry out while the model is in TensorRT?
The goal is to maximize inference time in the deployment stage. Since, there are so many different articles and methods to optimize a PyTorch model, include PyTorch JIT and torchscript, I am confused about which steps to carry out and which to not. The goal is to maximize inference time in the deployment stage.