I’m trying to get the most out of my machine. I’ve tried many things already but I believe there’s still more to squeeze out of it. It’s just that no matter what I do, I cannot get the last drops.
This is my hardware’s current utilization state:
I labelled the charts for CPU and GPU. The chart for the CPU is on the top and for the GPU at the bottom and they grow from the middle out (name of the program: btop
). My question is how can I move the two peaks within the blue rectangle to the left (charts are moving from right to left). The peaks are CPU processing the augmentation (images). And as far as I can tell, they can happen in parallel with the GPU processing the current batch. But for some reason, no matter what I do, their processing does not start until the GPU is done with the previous batch. BTW, there are two pairs of peaks since one of them is for the training dataset and the other for the validation dataset.
These are what I’ve done so far:
- AMP (Automatic Mixed Precision): I know it has nothing to do with what I’m asking but I thought I should mention it.
num_workers=8
: My CPU has 8 cores and 16 threads. Through experimentation, I realized that more than 8 workers does not help.prefetch_factor=2
: Technically, I didn’t set this parameter and 2 is the default value. I did experiment with higher numbers but it didn’t help either so I changed it back.pin_memory=False
: I did turn this feature on at one point. And it did help but only like 10%. The shape of the charts was more or less the same as I shown here. Perhaps, the only difference was that the saw teeth effect of the peaks were smooth. But I had to turn it off since I was randomly running out of vRAM.
And I’m out of ideas. I don’t understand why the processing of the next batch cannot be done in parallel when GPU is working on the previous one. Any ideas?