wolterlw
(Volodymyr)
February 13, 2020, 1:14am
2
Check out these threads
Since your dataset is tiny, I don’t think that multiple workers will help you much.
It seems you are just slicing the tensors currently without any transformations.
You could try to load all data, push it to the GPU beforehand and just slice the batch manually in your training loop. Maybe this will speedup your model a bit.
However, your data and model might be just too small to get a high GPU utilization.
As a small side note: you shouldn’t call the forward method of your model, but the mod…
Dataloader creates a new PROCESS for every worker, and the dataset has to be “copied” to every worker. Depending on whether you’re on Windows or Linux (spawning a process in Windows is much more expensive than forking one in Linux), and how the dataset stores it’s data (Tensors don’t seem to get copied from what I’ve tested, but Python structures, yes), you might have a very high overhead for creating the processes.
Unless you’re working with some supercomputer, I believe 8 workers is more than…
93% is excellent utilization and I believe having lower GPU utilization during validation is expected as you don’t compute gradients or make parameter updates so the process is a lot more data-intensive
I’d suggest profiling your DataLoader vs the training step as the first step to figure out where the bottleneck actually is.