In a multi-gpu setting, I need to move tensors around to one GPU for loss computation. My loss module looks like this:
class PerceptualLoss(nn.Module): def __init__(self): self.vgg_16_mean =Variable( #some value).cuda().float() self.vgg_16_std =Variable( #some value).cuda().float() class TotalLoss(nn.Module): def __init__(self): self.vgg_loss = PerceptualLoss()
I get results from my network’s forward pass using dataparallel.
In my main module, I have:
loss_module = TotalLoss().cuda(1) #place on GPU1 results = net(input) results.cuda(1) # move to GPU1 ground_truth.cuda(1) total_loss = loss_module(results, ground_truth)
Now, I have an “arguments on different GPUs error” because
.cuda(1) didn’t apply to the vgg16 mean and standard deviation Variables.
So I have two questions:
A) I’m moving the loss to a specific GPU in multi-gpu with DataParallel based on what was suggested here. Are there other ways of handling the memory imbalance? I tried making loss computation part of forward pass, but it was throwing errors with args on diff. GPUs as well.
B) How do I make a
.cuda(int) on an nn.Module apply to child modules?