When I use
Dataloader to load data with
num_worker>0, I got the following error:
Exception in thread Thread-1:
Traceback (most recent call last):
File "/home/Maro/anaconda3/envs/pytorch/lib/python3.5/threading.py", line 914, in _bootstrap_inner
File "/home/Maro/anaconda3/envs/pytorch/lib/python3.5/threading.py", line 862, in run
File "/home/Maro/anaconda3/envs/pytorch/lib/python3.5/multiprocessing/resource_sharer.py", line 139, in _serve
signal.pthread_sigmask(signal.SIG_BLOCK, range(1, signal.NSIG))
File "/home/Maro/anaconda3/envs/pytorch/lib/python3.5/signal.py", line 60, in pthread_sigmask
sigs_set = _signal.pthread_sigmask(how, mask)
ValueError: signal number 32 out of range
And I must set
num_workers to 0, but it cost too much time in loading data in each iteration.
Is there any good way to solve this problem? Any advice will be highly appreciated.
Thanks in advance.
Does the error related to the os configuration?
I have run the same code on two server with the same os (Ubuntu16.04.1) but different gpu. And set
num_workers=12 in DataLoader for both. One can run normally but another raise the ValueError mentioned above.
I’ve never seen this error on my machine, but searching for it it looks like Python 3.7 could fix this issue.
Would it be possible for you to create a new conda environment and try the code with Python 3.7?
If not, could you post a code snippet so that we can reproduce this error?
Oh, @ptrblck thank you very much, your help was greatly appreciated!
It resolved by updating Python3.5->Python3.7.https://bugs.python.org/issue33329
Another question, though it does not relate to this topic:
If I have two cards ( card0 and card1 ) and train my network just on card1, when I set
DataLoader it will occupy a portion of card0’s memory ( it seems to be the default ), is there any way to assign memory consumption to card1?
Thanks a lot.
PS: In another machine it works fine with Python3.5, it is so strange and I dont know what makes the mistake.
I assume the little memory will be allocated on GPU0.
If that’s the case, you can run your script using:
CUDA_VISIBLE_DEVICES=1 python script.py args
to hide GPU0 in your script. GPU1 will then be remapped to GPU0, so that you would have to use
'cuda' in your script.
Which PyTorch version are you using btw? I thought this issue was solved recently.
I haven’t seen this issue before, but apparently it’s related to some
multiprocessing functions in Python.
Thanks a lot.
I will have a try on it.
My PyTorch version is 1.0.1.post2, and on another machine is 1.0.0 which worked fine before.
Yes , it is a python bug , due to this isseu https://bugs.python.org/issue33329 , it got fixed in 3.6.6
I get this error when running python 3.7.7