Gpu utilization is less

I am training a simple dnn with 3 hidden layers of 512 nodes each. I have loading data using dataloader with batch size = 100, no.of workers = 128.
I have checked GPU usage with nvidia-smi, it is always showing GPU utilization 1%.

How to increase the GPU utilization.


Why did you use 128 workers? Why not num_workers=0?

If num_workers is more dataloader will load the data faster

Did you check whether it works as you expect?

yes, I checked with increasing nu_workers

How large is the dataset you used? How long is the input?

Total 16283 batches, each batch size is 100*1257

I am using custom dataloader to load the data in batches

Dataloader creates a new PROCESS for every worker, and the dataset has to be “copied” to every worker. Depending on whether you’re on Windows or Linux (spawning a process in Windows is much more expensive than forking one in Linux), and how the dataset stores it’s data (Tensors don’t seem to get copied from what I’ve tested, but Python structures, yes), you might have a very high overhead for creating the processes.

Unless you’re working with some supercomputer, I believe 8 workers is more than enough.
Also, most importantly, check your RAM usage, if your OS starts swapping memory to disk, it might get extremely slow.

Reduce the number of workers, and check for improvements.

Your problem might be that there simply isn’t that much to do for the GPU (if your model is very small, or your batch size is very small, for example), not necessarily that it’s waiting for data.

An easy way to check is to look for “pits” in GPU usage: if there are times the GPU usage suddenly decreases, it’s probably waiting for data (although you probably won’t be able to see this now, as your “max” appears to be 1%)


Hi cosmin.pascaru,
I changed the num_workers to 8. Now also it is still showing constantly 1% GPU usage. Memory usage of CPU is listed below. How to stop swapping memory to disk?

If the data type is float32, the size of the entire dataset is about 8GB, right? If so, you should put all data on memory, and set num_workers=0.

Hi Tony,
I have tried with num_workers=0 also. Till it is 1% usage only

Did you put all data on memory?

In which memory? I have set num_workers=0 in dataloader. All the data in the local storage, I am loading the data using dataloader

Sorry. It is CPU memory.

If we use dataloader it will load data into CPU memory

You should use a custom dataset as follows:

class CustomDataset(Dataset):
    def __init__(self, path):
        super(CustomDataset, self).__init__() = loadFrom(path)

   def __getitem__(self, index):

   def __len__(self):
        return len(

So, the time of loading data from the local storage is eliminated.

Yes I am also doing same thing, but instead of loading the data in initialization i am loading the data in “getitem”.

Loading should not be at getitem. The mini batch creation becomes very slow if loading is at getitem.

I am not creating any mini batch