GPU: high memory usage, low GPU volatile-util

If my CUDA utilisation in task manager is already >95%, is there no way to even speed up my training?

More info: my general CPU usage is 100% and GPU usage is ~30%. I was trying to figure out if there are overheads.

Thanks in advance!

You could still speedup the workload by using e.g. more efficient algorithms, avoiding unnecessary synchronizations etc. as described in the performance guide.

I would recommend to check the linked guide first, then to profile the workload using the PyTorch profiler or e.g. Nsight Systems to see where the bottleneck is.

1 Like

very useful advice, thanks a lot!

Thanks! That was a very useful advice! Be caution to every logging and metric behaviors!