C++ optimization using LibTorch

Hi all,
I have implemented the C++ inference for PyTorch models referring to this tutorial: Loading a TorchScript Model in C++ — PyTorch Tutorials 1.7.1 documentation
Is there any ways that I can optimize beyond this both in cpu and gpu
Note: LIbTorch is available for both GPU and CPU and I would like to know …the optimizations after using LibTorch.