I’m running someone else’s model locally, and I noticed that, as written, the GPU utilization is really low, and the training is slow. My initial reaction was to increase the batch size two orders of magnitude, however the training was still slow and the utilization was < 5%.
So, I have two questions:
- has anyone produced a nice tutorial on how to improve GPU utilization? This seems like an important topic for anyone creating their own models.
- is there a set of “usual suspects” in model/dataset/dataloader code that, when overlooked, kill GPU performance?