Moving nn.Modules between GPUs

In a multi-gpu setting, I need to move tensors around to one GPU for loss computation. My loss module looks like this:

class PerceptualLoss(nn.Module):
         def __init__(self):
                  self.vgg_16_mean =Variable( #some value).cuda().float()
                  self.vgg_16_std =Variable( #some value).cuda().float()

class TotalLoss(nn.Module):
         def __init__(self):
                 self.vgg_loss = PerceptualLoss()

I get results from my network’s forward pass using dataparallel.

In my main module, I have:

loss_module = TotalLoss().cuda(1) #place on GPU1 
results = net(input)
results.cuda(1) # move to GPU1
ground_truth.cuda(1)

total_loss = loss_module(results, ground_truth)

Now, I have an “arguments on different GPUs error” because .cuda(1) didn’t apply to the vgg16 mean and standard deviation Variables.

So I have two questions:
A) I’m moving the loss to a specific GPU in multi-gpu with DataParallel based on what was suggested here. Are there other ways of handling the memory imbalance? I tried making loss computation part of forward pass, but it was throwing errors with args on diff. GPUs as well.

B) How do I make a .cuda(int) on an nn.Module apply to child modules?

Register your mean and std as buffers.