While using nn.DataParallel only accessing one GPU

Tony_Gracious · June 16, 2018, 5:18pm

I have 2gpus in my system. I am using Dataparallel module over my model and I have made my both of the gpu visible using os.environ[“CUDA_VISIBLE_DEVICES”] = “0,1”. But, while running only 0 is selected if zero is the first in visible devises entry or 1 if it is the first in entry. What could be reasons why both of gpus wasn’t used together?

Then I tried manually creating replicas. But, I am getting error
RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion output_nr == 0 failed.
This happens at my LSTM part of my code

replicas = t.nn.parallel.replicate( model , device_ids)
inputs = t.nn.parallel.scatter(( dropout, encoder_word_input, encoder_character_input,
decoder_word_input, decoder_character_input,
None), device_ids)
replicas = replicas[:len(inputs)]
outputs = t.nn.parallel.parallel_apply(replicas, inputs)
out = t.nn.parallel.gather(outputs, output_device)

moskomule · June 17, 2018, 2:50am

Did you specify device_ids?

Tony_Gracious · June 17, 2018, 6:17am

Yes, I have specified it. It is not working
In my forward function of my model, I am taking multiple inputs. One of the inputs is an floating point, 4 of them are Variables of dimension minibatch x feature size, other is a latent vector input.
Is that could be reason for using only using 1 GPU?

moskomule · June 17, 2018, 9:31am

Hmm… Can you show us the lines you use DataParallel in your codes?

Tony_Gracious · June 17, 2018, 9:45am

rvae = RVAE(parameters)
device_ids = [i for i in range(t.cuda.device_count())]
rvae = t.nn.DataParallel(rvae, device_ids)
rvae = rvae.cuda()

forward function is the following

rvae(0., encoder_word_input, encoder_character_input,
decoder_word_input, decoder_character_input,
z=None)

moskomule · June 17, 2018, 10:45am

Just remove rvae = rvae.cuda(). I think this is the wrong part.

Tony_Gracious · June 18, 2018, 6:54am

That gave me an error. Expected object of type torch.LongTensor but found type torch.cuda.LongTensor for argument #3 ‘index’

ptrblck · June 18, 2018, 8:52am

How do you know only one GPU is working? Is the other one completely empty?
Could you add a print statement in your forward method, showing the current device of the tensor?
You can find a small example here. You would have to change it to print the device instead of the shape.

Tony_Gracious · June 18, 2018, 9:08am

I am using watch nvidia-smi to see the memory usage. Only one GPU is used and the other is not used

Tony_Gracious · June 18, 2018, 9:18am

I checked what you told. By using get_device() in forward function. Only one GPU is used.

Tony_Gracious · June 19, 2018, 3:31pm

I tried manually creating replicas.

replicas = t.nn.parallel.replicate(rvae (my model), device_ids)
inputs = t.nn.parallel.scatter(( dropout, encoder_word_input, encoder_character_input,
decoder_word_input, decoder_character_input,
None), device_ids)
replicas = replicas[:len(inputs)]
outputs = t.nn.parallel.parallel_apply(replicas, inputs)
out = t.nn.parallel.gather(outputs, output_device)

But, I am getting error
RuntimeError: torch/csrc/autograd/variable.cpp:115: get_grad_fn: Assertion output_nr == 0 failed.

This happens at my LSTM part of my code

moskomule · June 25, 2018, 5:34am

Sorry, you’re right. You need to send DataParallel to GPU.

justusschock · June 25, 2018, 7:10am

What is your batchsize? You need a batch_size > 1 to use both GPUs.

akurniawan · August 13, 2018, 8:18am

I am facing the same problem and my batch_size is > 1. I am using 2 NVIDIA P100 in google kubernetes engine.
@Tony_Gracious, may I know if you have solved this problem?

Tony_Gracious · August 13, 2018, 8:33am

No, I didn’t solved it

akurniawan · August 14, 2018, 8:51am

@Tony_Gracious in my case, it was because I was initially train the model using nn.DataParallel with one GPU, then once I reload the model, it seems DataParallel still store the previous device_ids, hence the single GPU. Now I manage to solve it by every time I load the model, I will re-wrap my model with nn.DataParallel

model = _load_model()
model = nn.DataParallel(model.module)

You also need to be careful for your data size. I need to modify model = nn.DataParallel(model.module) into model = nn.DataParallel(model.module, dim=1) since I am using batch_first=False

assyl_coop · December 10, 2018, 8:24am

device = torch.device(‘cuda :0,1’ if torch.cuda.is_available() else ‘cpu’)
rvae = RVAE(parameters)
rvae = nn.DataParallel(rvae)
rvae.to(device)

ToughMind · March 19, 2020, 7:17am

Anyone solved this？ I met the same situation。

huahuanZ · April 10, 2020, 6:43am

I think I have solved this problem.
For your model=DataParallel(model) at forward() step, if you pass arguments into forward(), according to pytorch document:

Arbitrary positional and keyword inputs are allowed to be passed into DataParallel EXCEPT Tensors. All tensors will be scattered on dim specified (default 0). Primitive types will be broadcasted, but all other types will be a shallow copy and can be corrupted if written to in the model’s forward pass.

which means if the input argument type is tensor then it would be split by dim=0 (which is the batch dimension). For other types like python list/dict/str，DataParallel.forward() automatically copies it to N replicas ( N equals to your GPU number).
The key is that, if you pass an argument like this
[torch.tensor]
or
{"example":torch.tensor}
Even though they are python list/dict, but DataParallel.forward() is not able to deal with these type of argument (And it won’t raise an error). So the fix is just to simply convert those argument (and all their elements) to python types.

Vanjoy · April 22, 2020, 2:05am

What do you mean by python types?