Can gradient accumulation help better utilize GPU?

I’m wondering on ways to improve GPU utilization.

While I’ve tried to max out batch size, to actually utilize GPU like nvidia-smi helps monitor in its rightmost column, does increasing the number of gradient-accumulation-steps help?

I’m noticing low GPU utilization and was wondering on ways to maximize it without needing to increase batch size.

From what I’ve this is mostly use-case dependent.

Few previous similar questions here: How to ensure my GPU is utilized to the fullest?

Most of them are sorta batch size related.

For a few generic attempts, a good guide - Performance Tuning Guide — PyTorch Tutorials 2.4.0+cu121 documentation

I have not come across anything better.