"Suggested max num workers is 2" but I have 96 cores?

Hi, I’m confused. I’ve got 96 CPUs with 2 threads each.
I chose num_workers=12 for my DataLoader but I’m getting a warning:

UserWarning: This DataLoader will create 12 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(

Why is the suggested number only 2?

I can see from the DataLoader source code that it sets

max_num_worker_suggest = len(os.sched_getaffinity(0))

and when I run len(os.sched_getaffinity(0)) manually I get 96.

On closer inspection, this seems to be a jupyter notebook problem.

I get 96 when I run len(os.sched_getaffinity(0)) in the Python REPL from the command line.

The same command executed in a jupyter notebook (in the same python environment) returns only 2. ??

(Am now looking around trying to figure out how to keep jupyter from limiting my CPUs to 2, and not finding much encouraging news on StackExchange so far.)

running

os.system("taskset -c -p 0-95 %d" % os.getpid())

from within jupyter got the number up to 96.

1 Like

I’m encountering a similar problem on a HPC with 80 cores:

/ccc/products2/python3-3.10.6/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/cuda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
/ccc/products2/python3-3.10.6/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/cuda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
/ccc/products2/python3-3.10.6/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/cuda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
/ccc/products2/python3-3.10.6/Rhel_8__x86_64/gcc--8.3.0__openmpi--4.0.1/cuda/lib/python3.10/site-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 16 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(

I’m using the default training script of torchvision.