Hi all,
I have spent the past day trying to figure out how to use multiple GPUs. In theory, parallelizing models across multiple GPUs is supposed to be as as easy as simply wrapping models with nn.DataParallel
. However, I have found that this does not work for me. To use the most simple and canonical thing I could find for proof of this, I ran the code in the Data Parallelism tutorial, line for line. The output is as follows - it is the same output that I get every time I try to run Pytorch with multiple GPUs:
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
<ipython-input-3-0f0d83e9ef13> in <module>
1 for data in rand_loader:
2 input = data.to(device)
----> 3 output = model(input)
4 print("Outside: input size", input.size(),
5 "output_size", output.size())
/usr/local/lib/python3.6/site-packages/torch/nn/modules/module.py in __call__(self, *input, **kwargs)
487 result = self._slow_forward(*input, **kwargs)
488 else:
--> 489 result = self.forward(*input, **kwargs)
490 for hook in self._forward_hooks.values():
491 hook_result = hook(self, input, result)
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in forward(self, *inputs, **kwargs)
141 return self.module(*inputs[0], **kwargs[0])
142 replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
--> 143 outputs = self.parallel_apply(replicas, inputs, kwargs)
144 return self.gather(outputs, self.output_device)
145
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py in parallel_apply(self, replicas, inputs, kwargs)
151
152 def parallel_apply(self, replicas, inputs, kwargs):
--> 153 return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
154
155 def gather(self, outputs, output_device):
/usr/local/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py in parallel_apply(modules, inputs, kwargs_tup, devices)
73 thread.start()
74 for thread in threads:
---> 75 thread.join()
76 else:
77 _worker(0, modules[0], inputs[0], kwargs_tup[0], devices[0])
/usr/local/lib/python3.6/threading.py in join(self, timeout)
1054
1055 if timeout is None:
-> 1056 self._wait_for_tstate_lock()
1057 else:
1058 # the behavior of a negative timeout isn't documented, but
/usr/local/lib/python3.6/threading.py in _wait_for_tstate_lock(self, block, timeout)
1070 if lock is None: # already determined that the C code is done
1071 assert self._is_stopped
-> 1072 elif lock.acquire(block, timeout):
1073 lock.release()
1074 self._stop()
KeyboardInterrupt:
Note that it hangs - I have to keyboard interrupt to stop. And the error is the same every time - some sort of deadlock is entered into, although I do not understand how or why.
Some information about my system:
Operating System: Ubuntu 16.04
GPUS: 4 1080tis
Pytorch version: 1.01
CUDA version: 10.0
NVIDIA Driver: 415
I have tried everything from only having a specific permutation of my GPUs be visible to CUDA to reinstalling everything related to CUDA but can’t figure out why I cannot run with multiple GPUs. If anyone could point me in the right direction, it would be greatly appreciated.