Why is pytorch's GPU utilization so low in production ( NOT training )?

Thanks for the detailed analysis.
I assume that the time given in your output corresponds to a single iteration time.

For me it looks like your data loading time is being hidden in the training script, since your training takes some time and the workers can preload the next batch. Thus it’s quite low at 0.08s. During validation the workload is smaller, since you are just computing the forward pass, thus the data loading time is now present. This might also be the reason for the low GPU utilization, since it now seems to create a data loading bottleneck due to the low workload during validation. Your Dataset implementation looks alright.

The first iteration might take a bit more time, as all workers are loading a batch and need some “warm up time”.

5 Likes