How to resolve the problem of using DataParallel giving `RuntimeError: can't start new thread` error when threading.active_count() is only 6?

`
Epoch: [1/24] | Iter: [6061/14863] | Time/Batch: 0.524 | CurrLoss: 11.3428 | AveLoss: 11.3451 | MarginAcc@1: 0.016757% | MarginAcc@5: 0.048788% | Acc@1: 0.016757% | Acc@5: 0.048788%

6

Epoch: [1/24] | Iter: [6071/14863] | Time/Batch: 0.524 | CurrLoss: 11.3428 | AveLoss: 11.3451 | MarginAcc@1: 0.016793% | MarginAcc@5: 0.048965% | Acc@1: 0.016793% | Acc@5: 0.048965%

6

Exception ignored in: <function _DataLoaderIter.del at 0x7f8fa3d142f0>
Traceback (most recent call last):
File “/run/mount/sdd1/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 717, in del
self._shutdown_workers()
File “/run/mount/sdd1/anaconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py”, line 696, in _shutdown_workers
self.worker_result_queue.put(None)
File “/run/mount/sdd1/anaconda3/lib/python3.7/multiprocessing/queues.py”, line 87, in put
self._start_thread()
File “/run/mount/sdd1/anaconda3/lib/python3.7/multiprocessing/queues.py”, line 170, in _start_thread
self._thread.start()
File “/run/mount/sdd1/anaconda3/lib/python3.7/threading.py”, line 847, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can’t start new thread
Traceback (most recent call last):
File “train.py”, line 328, in
main()
File “train.py”, line 193, in main
train(device, data_loader, model, margin_linear, lambda_func, criterion, optimizer, scheduler, args)
File “train.py”, line 219, in train
output = model(input)
File “/run/mount/sdd1/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 489, in call
result = self.forward(*input, **kwargs)
File “/run/mount/sdd1/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py”, line 143, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File “/run/mount/sdd1/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py”, line 153, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File “/run/mount/sdd1/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py”, line 73, in parallel_apply
thread.start()
File “/run/mount/sdd1/anaconda3/lib/python3.7/threading.py”, line 847, in start
_start_new_thread(self._bootstrap, ())
RuntimeError: can’t start new thread
How to resolve the problem of using DataParallel givingRuntimeError: can’t start new thread` error when threading.active_count() is only 6?

I have tried using Python 3.5 and Python 3.7 and both give the same error.

It occurs when I use 1 GPU or 2 GPUs with DataParallel. There is no problem if I run using 1 GPU without DataParallel.

I face this problem even with the imagenet code in the examples repository of PyTorch.

The way I use DataParallel is to add the line model = nn.DataParallel(model) in my code. Am I using it wrong? What else do I need to do?

1 Like