I am trying to convert the example transfer_learning_tutorial_multigpu.ipynb to multi gpu. In my case [0, 1].
I can control the one gpu case with …
> model =model.cuda(dev_id)
> inputs, labels = Variable(inputs).cuda(dev_id), Variable(labels).cudadev_id(dev_id)
> outputs = model(inputs)
> _, preds = torch.max(outputs.data, 1)
> loss = criterion(outputs, labels)
Everything (model, inputs, labels) lives on gpu 0 or gpu 1.
Now I get very confused when using DataParallel.
I have some success with forcing everything onto 1 gpu.
dev_id= 0 or 1
model =torch.nn.DataParallel(model, device_ids=[dev_id]).cuda(dev_i)
inputs, labels = Variable(inputs).cuda(dev_id), Variable(labels).cuda(dev_id)
outputs = model(inputs)
_, preds = torch.max(outputs.data, 1)
loss = criterion(outputs, labels)
But, when I move to multiple gpus… My understanding (and success) all falls apart.
1. model =torch.nn.DataParallel(model, device_ids=[0,1]).cuda() 2. inputs, labels = Variable(inputs), Variable(labels).cuda(0) 3. outputs = model(inputs) 4. _, preds = torch.max(outputs.data, 1) 5. loss = criterion(outputs, labels)
I’ve seen the imagenet example…but still do not understand.
Why do I need the first .cuda()?
How do I properly put my input on a gpu? Or do I? The examples do not use a input.cuda().
Can I always expect (if available) my output to be on device_id=0?
It seems so simple, but…