The num_workers
specified in the DataLoader
are executed on the CPU using multiprocessing and will not be shown on nvidia-smi
.
The GPU utilization might be low, if you are facing bottlenecks in your code, such as data loading.
Take a look at this post, which explains it well and suggests some performance improvements.
This is not correct. The model will use the specified device and PyTorch will not automatically use the CPU based on the performance of the DataLoader
.