In a multi-gpu setting, I need to move tensors around to one GPU for loss computation. My loss module looks like this:
class PerceptualLoss(nn.Module):
def __init__(self):
self.vgg_16_mean =Variable( #some value).cuda().float()
self.vgg_16_std =Variable( #some value).cuda().float()
class TotalLoss(nn.Module):
def __init__(self):
self.vgg_loss = PerceptualLoss()
I get results from my network’s forward pass using dataparallel.
In my main module, I have:
loss_module = TotalLoss().cuda(1) #place on GPU1
results = net(input)
results.cuda(1) # move to GPU1
ground_truth.cuda(1)
total_loss = loss_module(results, ground_truth)
Now, I have an “arguments on different GPUs error” because .cuda(1)
didn’t apply to the vgg16 mean and standard deviation Variables.
So I have two questions:
A) I’m moving the loss to a specific GPU in multi-gpu with DataParallel based on what was suggested here. Are there other ways of handling the memory imbalance? I tried making loss computation part of forward pass, but it was throwing errors with args on diff. GPUs as well.
B) How do I make a .cuda(int)
on an nn.Module apply to child modules?