torch.nn.DataParallel does not accept None for device_ids in 0.1.11

jtremblay · March 30, 2017, 4:13am

I just updated to .11 and I think the api for calling torch.nn.DataParallel has changed. In the previous version if I only had one GPU I would call the function with None passed as the device_ids. Now if I pass None I get the following error:

  File "/home/jtremblay/anaconda2/envs/py3/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 96, in data_parallel
    output_device = device_ids[0]
TypeError: 'NoneType' object is not subscriptable

documentation link to the function: http://pytorch.org/docs/nn.html?highlight=parallel#torch.nn.DataParallel

meijieru · April 2, 2017, 11:22am

Same problem when directly call nn.parallel.data_parallel.

smth · April 5, 2017, 2:06am

Hey @jtremblay and @meijieru

I am trying to replicate this issue, but I cant.
Can either of you give me a script to reproduce it?

Here is the script I used, along with it’s output:

import torch
from torch.autograd import Variable
import torch.nn as nn
import platform

print('Python version: ' +  platform.python_version())
print(torch.__version__)


print('Trying out device_ids=None')

model = nn.Linear(20, 30).cuda()
net = torch.nn.DataParallel(model, device_ids=None)


inp = Variable(torch.randn(128,20).cuda(), requires_grad=True)
out = net(inp)
out.backward(torch.ones(out.size()).cuda())

print('Passed')

Output:

Python version: 3.6.0
0.1.11+b13b701
Trying out device_ids=None
Passed

smth · April 5, 2017, 2:52am

actually it applies to using the functional version data_parallel.
I’ve identified the issue and fixed it in https://github.com/pytorch/pytorch/pull/1187
It should be in our next release.

For now, you can do:

device_ids = list(range(torch.cuda.device_count()))

jtremblay · April 7, 2017, 5:58pm

Sorry for the late reply I am travelling. I should have provided an example or do a PR. I have been using the ids from now. It was an easy fix for upgrading my scripts. But thank you so much for your time.

raaj043 · June 23, 2017, 3:03am

Hello,

I am getting Torch: unable to mmap memory: you tried to mmap 0GB error. I have 12 GB RAM, 1 GPU core and the datasize is 7GB. Ideally it should not give this error. I think i am making mistake in cuda and dataparallel, but unable to figure it out. Attached image contains the details. Please help!!