Num_workers in DataLoader

Hello everyone! I have a very huge data of image, so huge that 10 seconds need to create one batch.
For solving this problem i decided to use parameter “num_workers” in For example:
trainset = datasets.ImageFolder(data_dir, data_transforms)
train_loader =, batch_size=256, shuffle=True, num_workers=4)
But It works only for small batch size, for example for 32 workers, but for 64 it does not work. For batch size like 64 it works only for num_workers=0
Can you help me and explain for me such phenomena?

maybe you are running out of memory beause of your huge data

what is the relationship between batch_size and num_workers?
does num_workers have an impact on gpu memory?

Yes It is possible, but I do not know how to fix that :frowning:

Sorry, I do not know it. I thought that num_workers it is num proccess. Processes creating your batch and because of this processes working on cpu, in my opinion.

Hi , have you fixed the problem of num_workers? I have met the same problem,can you share more details about your machine, your OS etc.?

I have doubts about how to set the Num_workers in DataLoader and how does it work , if I set a larger Number ,can I shorten the time of training my model , and can any body explain how it speed up the process of training?

Have you solved the problem?I had the same problem.

I just put my comments on another thread: Guidelines for assigning num_workers to DataLoader

1 Like

Hello, I’m PyTorch beginner, but I want to share my case.

I supposed that a worker is assigned samples as the batch size in multi-thread jobs.

Let’s say,

  • N: total number of samples in dataloader
  • B: batch size
  • C: num_workers

if N = B * C would be appropriate when you determine the number of batch size and workers.
if N < B * C, then some threads didn’t work.
if N > B * C, then all threads works well.

Here is my test case,
I tested using vary batch size and num_workers.
Total number of samples for DataLoader = 127 (N)
Batch size (B), Num workers ©

myDataset = classMyDataSet() # classMyDataLoader is inherited class of “”
myLoader =, batch_size=B, shuffle=False, num_workers=C, pin_memory=True)
bx, by = next(iter(myLoader)) # at this line, dataloader started to work

Case 1. B = 1 and C = 0

  • Works well with one core
  • Only 1 sample was processed

Case 2. B = 1~10 and C = 1 ~ 12 (due to my available threads are 12)

  • Works well with C core
  • Only B*C samples were processed (i.e., N < B * C case)

Case 3. B = 11 and C = 12

  • Works well with C core, however a core processed only 7 samples
  • 127 samples (i.e., N) were processed (i.e., N > B * C case)

Case 4. B = 20 and C = 12

  • Works only 7 cores (6 cores processed 20 samples, and 1 core processed 7 samples), meanwhile 5 cores didn’t work
  • 127 samples (i.e., N) were processed (i.e., N > B * C case)

Case 5. B = 60 and C = 12

  • Works only 3 cores (2 cores processed 60 samples and 1 core processed 7), meanwhile 9 cores didn’t work
  • 127 samples (i.e., N) were processed (i.e., N > B * C case)

Case 6. B = 127 and C = 12

  • Only a core processed 127 samples, 11 cores didn’t work
  • 127 samples (i.e., N) were processed at (i.e., N > B * C case)

I have question about this issue.
Intuitively, I expected that the multi-threads work for samples as batch size.
However, it worked B * C samples when I call the batch loader (i.e., at the line “bx, by = next(iter(myLoader))”.
Is there any reason?

Thanks to PyTorch developers and contributors!


hello! I don’t understand the meaning of “total number of samples in dataloader”. Is that the number of data in a dataset we load by “Dataloader”? could you give an example?

Seems like Pytorch set up C threads and each of the threads is going to read B samples from the Dataloader. Real batch size equals C*B in one iteration.