Error in multi gpu training

akr90 · June 13, 2017, 10:33pm

Looking from the imagenet example it looks like multi-gpu training is pretty simple. All you need to do is add

net = torch.nn.DataParallel(net,device_ids=[0,1,2,3])
net.cuda()

and you are good to go. Is this correct or I need to do something else also ?
I did this and I am getting this error

RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:84

CUDA_VISIBLE_DEVICES is properly set. Can someone tell me the fix ?

Thanks,
A

smth · June 22, 2017, 4:18am

you probably dont have 4 GPUs.

you can set device_ids=None (or dont specify it) and it’ll use all available GPUs.