From what I’ve this is mostly use-case dependent.
Few previous similar questions here: How to ensure my GPU is utilized to the fullest?
Most of them are sorta batch size related.
For a few generic attempts, a good guide - Performance Tuning Guide — PyTorch Tutorials 2.4.0+cu121 documentation
I have not come across anything better.