nn.DataParallel in Pytorch 0.4.0 Not working

RebirthT · February 15, 2019, 2:35am

It works for me when running the tutorial codes of nn.DataParallel from Pytorch.
However in my code it doesn’t work.:

CUDA_VISIBLE_DEVICEs=1,3,4 python train.py
GPU_COUNT = torch.cuda.device_count() # 3
device = torch.device("cuda")
model.load_weights(model_path)
model = torch.nn.DataParallel(model)
model.to(device)

However when monitoring the memory thru terminal, model is only loaded into the first GPU
Any Solution?

dsuess · February 15, 2019, 2:59am

I’d be surprised if this worked at all. Try replacing the line

model = torch.nn.DataParallel

with

model = torch.nn.DataParallel(model)

RebirthT · February 15, 2019, 5:51am

To be noted that I’ve loaded the pretrained weight trained from single GPU(which is not wrapped in nn.DataParallel) before I wrap the model to nn.DataParallel

Don’t know if that matters

RebirthT · February 15, 2019, 5:52am

My mistake, its actually

 model = torch.nn.DataParallel(model)

ptrblck · February 15, 2019, 3:43pm

How large is your batch size?
Also, the statement CUDA_VISIBLE_DEVICEs=1,3,4 won’t have any effect.
If you would like to set it in your script, you should do it before importing torch or any other library which initializes CUDA and also use os.environ['CUDA_VISIBLE_DEVICES']=1,3,4.
However, I would recommend to set it as an env variable in your terminal by executing your script via:

CUDA_VISIBLE_DEVICES=1,3,4 python script.py args

RebirthT · February 15, 2019, 4:23pm

Sorry for not declaring it clearly
Ive actually set it in environment settings so that won’t be a problem.
Im just trying to say I got three GPU for training however only one is working.
And as the aforementioned that the pertained model won’t cause the problem since I’ve tried without loading pretrain model and the problem still exist
The only setting that works is when wrapping model in to nn.DataParallel, I set device_id:

device = torch.device("cuda:1,3,4")
model = torch.nn.DataParallel(model, device_id=[1,3,4])
input_data.cuda() is changed to input_data.to(device)

Something really weird is that:

I set three GPU; however, only two GPU is triggered(still get error due to some other problems–>Variable ‘delta’ is in another GPU, which shows that the second GPU want to calculate something with ‘delta’, but the variable is in another GPU device)
Default setting for device_id is using all GPU
.cuda() is actually the same as .to(device)

So I really don’t know what going on

ptrblck · February 15, 2019, 4:34pm

If you set CUDA_VISIBLE_DEVICEs=1,3,4, your GPUs will me masked and internally available as cuda:0, cuda:1, and cuda:2.
That should be the default as seen in this line of code
.cuda() will push your model/tensors to the default device. to() is the new style and I would recommend using it, as it’s easier to write device-agnostic code with it in my opinion.

RebirthT · February 15, 2019, 4:51pm

The first one u mentioned might be the reason why only two devices are triggered.
So should I set all the data.cuda() to data.to(device)?

[ERROR]
Variable ‘delta’ is in another GPU, which shows that the second GPU 
want to calculate something with ‘delta’, but the variable is in 
another GPU device

I simplify my project as follow to describe the error above:

class model_A(nn.Module):
...
   def forward(self, input):
      delta = ......
      delta = delta.cuda()

class model_B(nn.Module):
...

class model_C(nn.Module):
   def __init__(self, .....):
      self.M_A = model_A(...)
      self.M_B = model_B(...)
   def forward(self, input):
      A,B,C = self.M_A(input)
      D = self.M_A(A, B)
      ...

ptrblck · February 15, 2019, 5:02pm

Currently you are pushing delta to the default device manually (cuda:0 if you haven’t used a context manager to specify the device).
I assume model_A is wrapped in nn.DataParallel, which might trigger this error, since input will be on the specified devices.
If you need to create a new tensor inside your forward, you should push it to the current device:

delta = torch.randn(1, device=input.device)

RebirthT · February 16, 2019, 3:36pm

I only do something like:

model = model_C(...)
model = nn.DataParallel(model)

Does that mean model_A is also wrapped in nn.DataParallel???

So do u mean that my delta.cuda() only push the data to default GPU(cuda:0) instead of all GPUs? I have to do delta.to(device) which devices = torch.device(“cuda: 0, 1, 2”)

The multi-GPU tutorial from Pytorch, device is set to “cuda:0”:

devices = torch.device("cuda:0")

And it seems that its the same as devices = torch.device(“cuda”) which will contain all devices

ptrblck · February 16, 2019, 3:50pm

Yes, all children will also be wrapped in nn.DataParallel.

You can’t specify multiple devices using cuda() or to().
As described in the tutorial you push the model onto the default device, where e.g. all gradients will be accumulated, the data will be scattered from etc.
That’s also the reason you’ll usually see a bit more memory usage on the default device.
Have a look at @Thomas_Wolf’s blog post on this topic for more information.

RebirthT · February 18, 2019, 2:50pm

Thanks for ur kindly reply.
I finally find the problem, let me rewrite my codes:

>>> CUDA_VISIBLE_DEVICEs=1,3,4 python train.py

class model_A(nn.Module):
...
   def forward(self, input):
      delta = ......
      delta = delta.cuda()

class model_B(nn.Module):
...

class model_C(nn.Module):
   def __init__(self, .....):
      self.M_A = model_A(...)
      self.M_B = model_B(...)

   def train_model(...)
      # DataLoader
      ...
      for input in DataLoader:
           image, .... = input
           C = self.M_A(image)
           out = self.M_B(C)
           loss = criterion(out, label)
           ...

# Train Process:
device = torch.device("cuda:0")
model.load_weights(model_path)
model = torch.nn.DataParallel(model)
model.to(device)
model.module.train_model(...)

No matter how I set my GPU, it still kept train on my first GPU.
The reason is because I write a train_model instead of the forward method and train model by calling model.module.train_model.
It’s ok to train by calling model.train_model to train on one GPU; however, when training multi-GPU, model.module.train_model(model is wrapped in nn.DataParallel) won’t access the forward method in DataParallel which cause the model not be replicated to different devices and input is not scattered.
To solve this, I write the forward method and write the train process outside the model_C class

Still is there any way to train model by something like model.train_model on multi-GPU???
Many thanks!!