How to train this model on multi GPUs

WERush · September 22, 2017, 2:29am

Hi, all

I have a model which contains two parts. The first part “model1” takes one image and outputs a feature ‘model1_feat’. The second part “model2” takes the ‘model1_feat’ and another feature ‘input_feat’ as input, and generate the final output. I want to train this model on multi GPUs. I have writen the following code:

model1 = nn.DataParallel(model1).cuda()
model1_feat = model1(input_image)

model2 = nn.DataParallel(model2).cuda()
model2_feat = model2(model1_feat, input_feat)

But it does not work. The whole thread is blocked and the model can not generate any output. Can you help me.
BTW, the total model works fine on single card.

Thanks.

SelvamArul · September 22, 2017, 2:32pm

Instead of creating two models, you can create just one model like this. Then you can simply warp the model with nn.DataParallel.

vabh · September 22, 2017, 3:31pm

The .cuda() accepts a device id. So you could assign the GPUs as:

model1 = nn.DataParallel(model1).cuda(device=0)
model1_feat = model1(input_image)

model2 = nn.DataParallel(model2).cuda(device=1)
model2_feat = model2(model1_feat, input_feat)

Your current setup is replicating both of your models on all devices and spliiting the data across them.

WERush · September 23, 2017, 3:44am

Thanks, my code could run on multi GPUs without modification, but the GPU memory is extremely unbalanced.

WERush · September 23, 2017, 3:47am

Honestly, your code raises error, " RuntimeError: all tensors must be on devices[0] ". I think we can not pass the second part on (device_id = 1).

guoqiang_Wei · September 23, 2017, 8:23am

Your model is like a conditional GAN, I`m also doing some experiments like yours.
I think you should put the both model on multi GPU first, and in the training procedure, put the model1_feat and input_feat to the model2, like this:

model1 = nn.DataParallel(model1).cuda()
model2 = nn.DataParallel(model2).cuda()
# in training procedure
model1_feat = model1(input_image)
model2_feat = model2(model1_feat, input_feat)

and you can set the multi GPU in command like CUDA_VISIBLE_DEVICES=0,1.

as I know, you can not pass tensors between different GPUs in running procedure.

WERush · September 23, 2017, 10:47am

Thanks for your reply. I think you are right. But the problem is that the GPU memory is extremely unbalanced. The first GPU comsumes a lot of memory while othes only used a little. For example
| 0 22043 C /usr/bin/python 11138MiB |
| 1 22043 C /usr/bin/python 5724MiB |
| 2 22043 C /usr/bin/python 5548MiB |
| 3 22043 C /usr/bin/python 5613MiB

Any ideas?

guoqiang_Wei · September 24, 2017, 4:59am

@WERush
I`m also confused about your problem. Can you provide some parameters of your code like batchsize ?

WERush · September 24, 2017, 5:38am

Batch_size = 16. It is the largest batch_size that the machine could support.

guoqiang_Wei · September 24, 2017, 6:07am

I`m also confused, may anyone else can provide some help.
maybe you could provide more details of your code if convenient.

suke · January 25, 2018, 12:31pm

Hi, can I train or test one model using multi-GPU
currently I found when I training model, just one GPU run. it will be slow.
Thank you!

WERush · February 2, 2018, 9:20am

First, change your model to nn.DataParallel(model)
Then, Use the command line: CUDA_VISIBLE_DEVICES=0,1 python train.py

nvs · February 28, 2018, 5:00pm

gpu_ids = [2, 3, 4]
torch.cuda.set_device(gpu_ids[0]) #fix the bug for " RuntimeError: all tensors must be on devices[0] "

for use multigpu in dataset loader use: pin_memory=True
model = torch.nn.DataParallel(model, device_ids=gpu_ids)
model.cuda()

for vars in train use:
target_var = torch.autograd.Variable(target.cuda(async=True))
input_var = torch.autograd.Variable(input.cuda(async=True), requires_grad=True, volatile=False)

for vars in test stage:
target_var = torch.autograd.Variable(target.cuda(async=True))
input_var = torch.autograd.Variable(input.cuda(async=True), volatile=True)

PeterXiaoGuo · June 23, 2018, 7:08am

Hi,

I find you could just implement one model class and use torch.nn.DataParallel to simply to train a model in parallel.

>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var)

https://pytorch.org/docs/stable/nn.html

for your reference.

Rao_Shivansh · July 6, 2018, 1:03pm

hi but when i try this i get error in my loss function, maybe the targets remain in gpu 1 and model outputs in gpu 0 .
this is the error I get in my loss function :
buffer[torch.eq(target, -1.)] = 0
RuntimeError: invalid argument 2: sizes do not match at /opt/conda/conda-bld/pytorch_1512946747676/work/torch/lib/THC/generated/…/generic/THCTensorMasked.cu:13
this is not an error in my code but an error popping up after using parallelism of data ( as i tried to run my less intensive code both with and without data parallelism and it throws up same error while using it with data parallelsim)

my model is intensive and I have 2 gpu’s 12206MiB each. I just need to split my model to use both gpu’s while training as well as testing.

thanks
btw my model is a fcn and its batch size is 1

Rao_Shivansh · July 10, 2018, 1:35am

hi!
i try to do what you stated but get the following error:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1512946747676/work/torch/lib/THC/generic/THCStorage.cu line=58 error=2 : out of memory
*** Error in `python’: free(): invalid pointer: 0x00007f2a53091780 *

My gpu isnt out of memory though