nn.DataParallel(model).cuda() hangs

(Pete Tae-hoon Kim) #1


If I use cuda for my network by
Everything is ok. The model is big, so it consumes 91% of video memory.

If I use
model = nn.DataParallel(model).cuda()
Then it seems to progress at first, but soon it hangs. When I press CTRL-C, I always get messages as follows:

Traceback (most recent call last):
  File "/home/polphit/anaconda3/lib/python3.6/multiprocessing/process.py", line 249, in _bootstrap
  File "/home/polphit/anaconda3/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/home/polphit/anaconda3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 28, in _worker_loop
    r = index_queue.get()
  File "/home/polphit/anaconda3/lib/python3.6/multiprocessing/queues.py", line 343, in get
    res = self._reader.recv_bytes()
  File "/home/polphit/anaconda3/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
  File "/home/polphit/anaconda3/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
  File "/home/polphit/anaconda3/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)

I tried on two different machines and got the same issue.

  • Ubuntu 16.04,
  • conda 4.3.14,
  • pytorch installed from source,
  • python 3.6.0.final.0
  • requests 2.12.4
  • CUDA 8.0
  • cuDNN 5.1

When I run the same code on a machine without conda, and python3, it works well.

Can I get a clue to resolve this issue?
Thank you.

(Adam Paszke) #2

That’s a stack trace of a data loader process, can you paste a full error into a gist and link it here?

(Pete Tae-hoon Kim) #3

Oh, that stack trace is all what I could get, since it just hangs without error.
Well, I guess it’s a kind of synchronization issue.
I have four networks netA, netB, netC, netD, which were

netA = nn.DataParallel(netA).cuda()
netB = nn.DataParallel(netB).cuda()
netC = netC.cuda(0)
netD = netD.cuda(1)

(I have two GPU devices)

Flow is

i (input) -> netA ---> netB -> x (output #1)
                   +-> netC -> y (output #2)
                   +-> netD -> z (output #3)

If this is not helpful to guess the cause, I would like to simplify my codes to reproduce the issue with minimal data upload.

(Pete Tae-hoon Kim) #4

Oh, when I add


at the end of a batch, one machine works properly, although the other machine still has the same issue.

(Adam Paszke) #5

Oh yeah this will happen. It’s because nn.DataParallel uses NVIDIA nccl library and it just deadlocks if you happen to do two calls at the same time… I guess we’ll need to add some mutexes.

(Adam Paszke) #6

Unfortunately even if we add these locs, doing that in two processes that use the same GPUs in DataParallel will deadlock too…

(Jin Ma) #7

so…is it a bug of pytorch? I met the same issue.

(Adam Paszke) #8

No, it’s a bug in NCCL (NVIDIA’s library). But you probably shouldn’t be using the same GPU in multiple data parallel jobs anyway.


I have similar problem resulting in process hanging if I use DataParallel on 2 K80 GPUs. Do you know what might be an issue @apaszke? If I restrict to one GPU only everything is working fine.


hi everyone, NVIDIA’s @ngimel has investigated this problem, and the hangs might not be related to pytorch. She has written a detailed comment here on figuring out the issue and working around it:

Please have a look and see if it applies to you.