We sometimes reuse existent models to build loss module like perceptual loss and GAN’s adversarial loss. I would like to know that if the existent models are accelerated with nn.DataParallel, it is inefficient to use nn.DataParallel one more time for the loss Module which use the existent models.
For example, model1 and model2 compose cycle loss module as follows. In that case, is the code, “cycle_loss = nn.DataParallel(CycleLoss(model1, model2)).cuda()”, inefficient?
import torch
import torch.nn as nn
import torch.nn.functional as F
class Model(nn.Module):
def __init__(self):
super().__init__()
self.conv = nn.Conv2d(1,1,3,1,1)
def forward(input):
return F.relu(self.conv(input))
class CycleLoss(nn.Module):
def __init__(self, model1, model2):
super().__init__()
self.model1 = model1
self.model2 = model2
self.l1 = nn.L1Loss()
def forward(input1, input2):
loss = self.l1(self.model2(self.model1(input1)), input1)
loss += self.l1(self.model1(self.model2(input2)), input2)
return loss[None] # expand dim=0 to concatnate
# make model
model1 = nn.DataParallel(Model()).cuda()
model2 = nn.DataParallel(Model()).cuda()
# loss module
cycle_loss = nn.DataParallel(CycleLoss(model1, model2)).cuda()
# opt
opt = torch.optim.Adam(list(model1.parameters()) + list(model2.parameters()))
# train
for input1, input2 in data_loader:
opt.zero_grad()
loss = cycle_loss(input1, input2)
loss = loss.mean()
loss.backward()
opt.step()
I am afraid that nesting nn.DataParallel makes the code perform scatter and gather at each sub-module needlessly.
Thank you.