Use DataParallel for gradient calculation and weight updates

DoubtWang · December 25, 2018, 8:21am

DataParallel method can split the big batch into small batch,
then run on different GPUs.

I don’t know how to use the DataParallel method to complete the gradient calculation and weight updates.
And,
the weight updates dependent on the average gradients from the different gpus.

My specific code:

class Model(nn.Module):
    def train(self, input):
      output = self.fc(input)
      return output

    def compute_loss(self, output, target):
      return Loss_compute(output, target)

    def forward(self, data, target, optim):
      out = self.train(data)

      loss = self.compute_loss(out, target) #shape (B)

      loss.mean(0).backward()
      optim.step()

      return loss


gpus = [0, 1, 2, 3]
model = nn.DataParallel(model, device_ids=gpus, dim=0)
# dim = 0 (batch size)
optim = Optim(optimer, learning_rate) 
# Optim class 


for data, target in rand_loader:
  data = data.cuda()
  target = target.cuda()
  model(data, target, optim)

I think using the DataParallel method to update weight will reduce the train time.
However,
how to use the average gradients to updates weight ?
I hope that I am clear.
Thanks!!!

smth · December 28, 2018, 11:07pm

DataParallel doesn’t support the situation where you have optimizer inside the forward function.

Instead you have to keep forward to only compute and return the loss, and as part of training loop, you should do the lines:

loss = model(data, target)
loss.mean(0).backward()
optim.step()

DoubtWang · December 29, 2018, 1:41am

Thanks for your reply.
I already understand what you mean.
Thanks !!!