Optimization of torch DL model based on mask2former Architecture for NVIDIA Jetson platform

Hello Team,
We have a DL model called panoptic segmentation developed with pytorch framework. We want to optimize this model for NVIDIA JETSON platform to achieve better inference. Model used recent transformer based (mask2former) architecture.
Could you pls suggest any good optimization techniques which can speed-up inference and have maximum FPS? Since it is used transformer based architecture, we are facing difficulties to optimize.

Could you please suggest?

Thanks
Prakash Sahoo