Why multiple threads are spawned even with num_workers=0 in pytorch dataloader?

torch: 1.6.0
torchvision: 0.7.0

Created a custom dataloader to apply custom augmentations on the image and transform it to a tensor. Even without enabling custom augmentations and setting num_workers=0 in the Dataloader object (used default way as well by not setting anything), it is still consuming around 15-20 threads (not consuming the whole thread but 9% of it). I am not sure what is causing this issue as I want to keep the resources free to be used by compute intensive processes of other users.


  1. Setting OMP and other env variables to restrict the use of threads, using os.environ module in Python
  2. Tried setting combinations of num_workers and pin_memory params
  3. Optimized the code to avoid copying ops to CPU and using numpy for calculation of metrics there.
  4. I had installed torch using conda at the time of env creation but installed torch using pip due to some issues later. Can that cause an issue?

@ptrblck: Can you help me out in understanding as to what is actually causing this or does the API has some internal operations which are segregated over the threads, by default?!

I would check what operations are actually using multiple threads—you could do this by moving the dataloader operations outside of a dataloader and seeing if they are still using multiple threads. This would help rule out whether the problem is with the dataloader or the operations it is performing.

self.transform = transforms.Compose([transforms.ToTensor()])
image = self.transform(image) 
label = torch.tensor(0) if class_=='normal' else torch.tensor(1)
return image, label

This is the only operation getting performed inside the dataloader.

Note that ToTensor() can call a few operations under the hood such as rounding, dtype conversion, and memory layout transformation, so I would check if any of those individual operations show as using multiple threads on your system: