blackyang
(Xiao Yang)
1
Hi, I have two loss functions whose return value has to be variables of cpu tensor and gpu tensor, respectively. Therefore, I cannot do:
loss = loss1 + loss2
loss.backward()
because loss1.data
is a cpu tensor and loss2.data
is a gpu tensor. How to correctly do back-propagation? Thanks!
SimonW
(Simon Wang)
2
loss1.gpu() + loss2
or
loss1 + loss2.cpu()
or
loss1.backward(); loss2.backward()
etc.
1 Like
blackyang
(Xiao Yang)
3
But a Variable doesn’t have gpu() or cpu(), right?
The third method is slow, because in my case loss1 and loss2 share many subgraphs below.
blackyang
(Xiao Yang)
4
Just found a quick hack:
suppose loss1
is the cpu tensor Variable, then we can directly set:
loss1.data = loss1.data.cuda()
The gradients are also correct. Verified by a simple toy example:
import torch
from torch.autograd import Variable
x1 = Variable(torch.rand(10), requires_grad=True)
x2 = Variable(torch.rand(10).cuda(), requires_grad=True)
x1.data = x1.data.cuda()
y = x1 + 2 * x2
y.backward(y.data.clone().fill_(1))
print(x1.grad)
print(x2.grad)
SimonW
(Simon Wang)
5
sorry I meant .cuda instead of .gpu
colesbury
(Sam Gross)
6
Variable has a .cuda()
and a .cpu()
method. Gradients are also correctly back-propogated through the call.