How to train this model on multi GPUs

Hi, all

I have a model which contains two parts. The first part “model1” takes one image and outputs a feature ‘model1_feat’. The second part “model2” takes the ‘model1_feat’ and another feature ‘input_feat’ as input, and generate the final output. I want to train this model on multi GPUs. I have writen the following code:

model1 = nn.DataParallel(model1).cuda()
model1_feat = model1(input_image)

model2 = nn.DataParallel(model2).cuda()
model2_feat = model2(model1_feat, input_feat)

But it does not work. The whole thread is blocked and the model can not generate any output. Can you help me.
BTW, the total model works fine on single card.


Instead of creating two models, you can create just one model like this. Then you can simply warp the model with nn.DataParallel.

The .cuda() accepts a device id. So you could assign the GPUs as:

model1 = nn.DataParallel(model1).cuda(device=0)
model1_feat = model1(input_image)

model2 = nn.DataParallel(model2).cuda(device=1)
model2_feat = model2(model1_feat, input_feat)

Your current setup is replicating both of your models on all devices and spliiting the data across them.

1 Like

Thanks, my code could run on multi GPUs without modification, but the GPU memory is extremely unbalanced. :frowning:

Honestly, your code raises error, " RuntimeError: all tensors must be on devices[0] ". I think we can not pass the second part on (device_id = 1).

Your model is like a conditional GAN, I`m also doing some experiments like yours.
I think you should put the both model on multi GPU first, and in the training procedure, put the model1_feat and input_feat to the model2, like this:

model1 = nn.DataParallel(model1).cuda()
model2 = nn.DataParallel(model2).cuda()
# in training procedure
model1_feat = model1(input_image)
model2_feat = model2(model1_feat, input_feat)

and you can set the multi GPU in command like CUDA_VISIBLE_DEVICES=0,1.

as I know, you can not pass tensors between different GPUs in running procedure.

Thanks for your reply. I think you are right. But the problem is that the GPU memory is extremely unbalanced. The first GPU comsumes a lot of memory while othes only used a little. For example
| 0 22043 C /usr/bin/python 11138MiB |
| 1 22043 C /usr/bin/python 5724MiB |
| 2 22043 C /usr/bin/python 5548MiB |
| 3 22043 C /usr/bin/python 5613MiB

Any ideas?

I`m also confused about your problem. Can you provide some parameters of your code like batchsize ?

Batch_size = 16. It is the largest batch_size that the machine could support.

I`m also confused, may anyone else can provide some help.
maybe you could provide more details of your code if convenient.

Hi, can I train or test one model using multi-GPU
currently I found when I training model, just one GPU run. it will be slow.
Thank you!

First, change your model to nn.DataParallel(model)
Then, Use the command line: CUDA_VISIBLE_DEVICES=0,1 python

1 Like

gpu_ids = [2, 3, 4]
torch.cuda.set_device(gpu_ids[0]) #fix the bug for " RuntimeError: all tensors must be on devices[0] "

for use multigpu in dataset loader use: pin_memory=True
model = torch.nn.DataParallel(model, device_ids=gpu_ids)

for vars in train use:
target_var = torch.autograd.Variable(target.cuda(async=True))
input_var = torch.autograd.Variable(input.cuda(async=True), requires_grad=True, volatile=False)

for vars in test stage:
target_var = torch.autograd.Variable(target.cuda(async=True))
input_var = torch.autograd.Variable(input.cuda(async=True), volatile=True)



I find you could just implement one model class and use torch.nn.DataParallel to simply to train a model in parallel.

>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var)

for your reference.

hi but when i try this i get error in my loss function, maybe the targets remain in gpu 1 and model outputs in gpu 0 .
this is the error I get in my loss function :
buffer[torch.eq(target, -1.)] = 0
RuntimeError: invalid argument 2: sizes do not match at /opt/conda/conda-bld/pytorch_1512946747676/work/torch/lib/THC/generated/…/generic/
this is not an error in my code but an error popping up after using parallelism of data ( as i tried to run my less intensive code both with and without data parallelism and it throws up same error while using it with data parallelsim)

my model is intensive and I have 2 gpu’s 12206MiB each. I just need to split my model to use both gpu’s while training as well as testing.

btw my model is a fcn and its batch size is 1

i try to do what you stated but get the following error:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512946747676/work/torch/lib/THC/generic/ line=58 error=2 : out of memory
*** Error in `python’: free(): invalid pointer: 0x00007f2a53091780 *

My gpu isnt out of memory though