I’m using pytorch on an aws machine which runs on linux. It has 32 CPUs and 1 GPU.
My data is stored on the SSD of the machine and is loaded during the training in the Dataset. I am using the DataLoader class to feed data to the model during the training. I notice that neither the GPU nor the CPU are saturated. I use
watch -n 0.5 nvidia-smi to monitor GPU usage and
htop to monitor CPU usage. GPU usage is around 30% average. The 32 CPUs are 100% used during the very beginning of the training (maybe first batch, only a few seconds) but then only 4 or 5 are used during the rest of the training.
If the GPU is the bottleneck then it should be around 100% all the time and if the CPUs were the bottleneck I would expected the same for all 32 of them.
How come CPU usage AND GPU usage are so low simultaneously ? How can I speed up training ?
I have tried mixed precision and other stuff but my question is mostly about understanding why CPU and GPU usage are so low.
Here is my (train) dataloader :
DataLoader(dataset=visits_dataset, batch_size=256, collate_fn=pad_collate, sampler=SubsetRandomSampler(train_indices), pin_memory=True, num_workers=32)
I would be very thankful for any help.