ValueError: signal number 32 out of range when loading data with `num_worker>0`

Hi guys,

When I use Dataloader to load data with num_worker>0, I got the following error:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/Maro/anaconda3/envs/pytorch/lib/python3.5/threading.py", line 914, in _bootstrap_inner
    self.run()
  File "/home/Maro/anaconda3/envs/pytorch/lib/python3.5/threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "/home/Maro/anaconda3/envs/pytorch/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
    signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
  File "/home/Maro/anaconda3/envs/pytorch/lib/python3.5/signal.py", line 60, in pthread_sigmask
    sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range

And I must set num_workers to 0, but it cost too much time in loading data in each iteration.
Is there any good way to solve this problem? Any advice will be highly appreciated.

Thanks in advance.

Does the error related to the os configuration?

I have run the same code on two server with the same os (Ubuntu16.04.1) but different gpu. And set num_workers=12 in DataLoader for both. One can run normally but another raise the ValueError mentioned above.

I’ve never seen this error on my machine, but searching for it it looks like Python 3.7 could fix this issue.
Would it be possible for you to create a new conda environment and try the code with Python 3.7?
If not, could you post a code snippet so that we can reproduce this error?

2 Likes

Oh, @ptrblck thank you very much, your help was greatly appreciated!

It resolved by updating Python3.5->Python3.7.https://bugs.python.org/issue33329

Another question, though it does not relate to this topic:
If I have two cards ( card0 and card1 ) and train my network just on card1, when I set pin_memory=True in DataLoader it will occupy a portion of card0’s memory ( it seems to be the default ), is there any way to assign memory consumption to card1?

Thanks a lot.:wink:

PS: In another machine it works fine with Python3.5, it is so strange and I dont know what makes the mistake.

1 Like

I assume the little memory will be allocated on GPU0.
If that’s the case, you can run your script using:

CUDA_VISIBLE_DEVICES=1 python script.py args

to hide GPU0 in your script. GPU1 will then be remapped to GPU0, so that you would have to use 'cuda:0'or just 'cuda' in your script.

Which PyTorch version are you using btw? I thought this issue was solved recently.

I haven’t seen this issue before, but apparently it’s related to some multiprocessing functions in Python.

Thanks a lot.

I will have a try on it.
My PyTorch version is 1.0.1.post2, and on another machine is 1.0.0 which worked fine before.

Yes , it is a python bug , due to this isseu https://bugs.python.org/issue33329 , it got fixed in 3.6.6

I get this error when running python 3.7.7