Custom methods in DistributedDataParallel

hij · April 16, 2020, 10:18pm

I am trying to do multi gpu training with DistributedDataParallel. I wrap it around my model. However my model has a custom function that now i call by doing model.module.function(x). I was wondering if this is ok and if something bad will happen. Thanks

mrshenli · April 16, 2020, 10:54pm

What does this custom function do? and when do you call this custom function? If it does not modify parameters and the autograd graph built during the forward pass, it should be OK.

hij · April 16, 2020, 11:29pm

The pseudo code is something like this

output = model(input)
output2 = model(input2)
final_output = model.module.function(output, output2)
loss = loss_function(final_output)
optimizer.zero_grad()
loss.backward()
optimizer.step()

Would this be fine? The custom function is just a MLP to classify something. It does not change anything, but I want it to get updated when I call my optimizer.step()

mrshenli · April 17, 2020, 12:51am

If model is a DistributedDataParallel (DDP) instance, this won’t work. Because setup some internal states at the end of the forward pass, and does not work if you call forward twice without a backward in between.

However, this can be easily solve by wrapping the two forward and the function invocation into a wrapper model, and then pass that wrapper model to DDP, sth like:

class WrapperModel(nn.Module):
  def __init__(self, model) :
    super(WrapperModel, self).__init__()
    self.model = model

  def forward(input, input2): 
    output = model(input)
    output2 = model(input2)
    final_output = model.module.function(output, output2)
    return final_output

ddp = DistributedDataParallel(WrapperModel(model).to(device), device_ids=[device])

final_output = ddp.forward(input, input2)
loss = loss_function(final_output)
optimizer.zero_grad()
loss.backward()
optimizer.step()

hij · April 17, 2020, 1:16am

I called broadcast_buffers=False so I didnt have an issue calling forward twice. In that case, is it fine if i call my custom function the way I did and will the gradients be correct?

mrshenli · April 17, 2020, 2:23am

If the model.module.function is not using the parameters in the model, it should work.

hij · April 17, 2020, 2:47am

A little more details on my method. Pseudo code is

class model(nn.Module):
  def __init__(self) :
    super(model, self).__init__()
    self.encoder = Encoder()
    self.decoder = Decoder()
    self.mlp = MLP()
  def encode(self, x):
    return self.encoder(x)
  def decode(self, x): 
    return self.decoder(x)
  def classify(self, a, b)
    return self.mlp(a, b)
  def forward(self, x):
    enc = self.encode(x)
    out = self.decode(enc)
    return enc, out
# this is my main training script
enc, out = model(x)
enc2 = enc + d #d is some random perturbations
out2 = model.module.decode(enc2)
pred = model.module.classify(enc, enc2)

There are a bunch of other stuff, but in this scenario, my decode function is using the parameters in model? Would this be an issue? There are no errors when running.

mrshenli · April 17, 2020, 3:32am

how do yo compute the final loss (the one where backward is launched from)? I assume both end and out contribute to that loss? If so, this looks OK to me.

This should be an issue for your current use case, but I want to mention that this probably won’t work correctly with find_unused_parameters=True mode. Because the mlp is used outside of forward, and DDP will find unused parameters using forward output. So in that mode, DDP would treat parameters in mlp as unused parameters although they are actually part of the autograd graph.

hij · April 17, 2020, 5:41am

my loss functions is something like

loss1 = adv_loss(out) #make output image look realistic
loss2 = adv_loss(out2)
loss3 = adv_loss(enc) #make encoding normal distributed
loss4 = adv_loss(enc2)
loss5 = l1_loss(out, x) # reconstruction loss
loss6 = l1_loss(out2, x)
loss7 = cross_entropy_loss(pred, GT)

I dont have find_unuse_parameters=True and have no error. If i understand what you are saying, the gradients are fine?

mrshenli · April 17, 2020, 2:08pm

Yes, I think the gradients should be fine.