nn.DataParallel hangs with Pytorch 0.4.1 and CUDA 9.1.85 on TITAN V

Hi guys,

I tested a simple example with nn.DataParallel() to use multiple GPUs, but got a hang.

import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable

class MyNet(nn.Module):
	def __init__(self):
		super(MyNet, self).__init__()
		self.linear = nn.Linear(2,1)
	def forward(self, x):
		h = self.linear(x)
		return h

epochs  = 2000
lr = 1e-3
momentum = 0
w_decay = 1e-5

train_data = torch.randn(288,2)
train_label = torch.zeros([288], dtype=torch.long)

num_gpu = list(range(torch.cuda.device_count()))
model = nn.DataParallel(MyNet().cuda(0), device_ids = num_gpu)#.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr = lr, weight_decay = w_decay)

print "Starting training"

model.train()
for epoch in range(epochs):
	optimizer.zero_grad()
	inputs = Variable(train_data.cuda(0))
	labels = Variable(train_label.cuda(0))
	outputs = model(inputs)
	loss = criterion(outputs, labels)
	loss.backward()
	optimizer.step()
	print("epoch{}, loss: {}".format(epoch, loss.data.item()))

It hangs when I try to forward the data to the model. nvidia-smi gives

Thu Aug 16 09:56:56 2018       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.67                 Driver Version: 390.67                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN V             Off  | 00000000:1B:00.0 Off |                  N/A |
| 28%   39C    P8    25W / 250W |   1087MiB / 12066MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  TITAN V             Off  | 00000000:1C:00.0 Off |                  N/A |
| 28%   41C    P2    39W / 250W |   1087MiB / 12066MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   2  TITAN V             Off  | 00000000:1D:00.0 Off |                  N/A |
| 31%   45C    P2    41W / 250W |   1087MiB / 12066MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   3  TITAN V             Off  | 00000000:1E:00.0 Off |                  N/A |
| 31%   45C    P2    40W / 250W |   1087MiB / 12066MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   4  TITAN V             Off  | 00000000:3D:00.0 Off |                  N/A |
| 28%   39C    P2    38W / 250W |   1087MiB / 12066MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   5  TITAN V             Off  | 00000000:3E:00.0 Off |                  N/A |
| 28%   41C    P2    40W / 250W |   1087MiB / 12066MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   6  TITAN V             Off  | 00000000:3F:00.0 Off |                  N/A |
| 28%   40C    P2    38W / 250W |   1087MiB / 12066MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   7  TITAN V             Off  | 00000000:40:00.0 Off |                  N/A |
| 31%   45C    P2    40W / 250W |   1087MiB / 12066MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
|   8  TITAN V             Off  | 00000000:41:00.0 Off |                  N/A |
| 29%   43C    P2    41W / 250W |   1087MiB / 12066MiB |    100%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0    131331      C   python                                      1076MiB |
|    1    131331      C   python                                      1076MiB |
|    2    131331      C   python                                      1076MiB |
|    3    131331      C   python                                      1076MiB |
|    4    131331      C   python                                      1076MiB |
|    5    131331      C   python                                      1076MiB |
|    6    131331      C   python                                      1076MiB |
|    7    131331      C   python                                      1076MiB |
|    8    131331      C   python                                      1076MiB |
+-----------------------------------------------------------------------------+

I have tried the solution in this, but it didn’t work.

I use

  • CUDA 9.1.85
  • Pytorch 0.4.1 (installed by pip)
  • Python 2.7.13
  • Debian 4.9.110-3+deb9u1 (2018-08-03) x86_64 GNU/Linux
  • 9 TITAN V cards

Any ideas to solve this issue? Or I should let NVIDIA’s folks see this issue?