Use DataParallel for gradient calculation and weight updates


#1

DataParallel method can split the big batch into small batch,
then run on different GPUs.

I don’t know how to use the DataParallel method to complete the gradient calculation and weight updates.
And,
the weight updates dependent on the average gradients from the different gpus.

My specific code:

class Model(nn.Module):
    def train(self, input):
      output = self.fc(input)
      return output

    def compute_loss(self, output, target):
      return Loss_compute(output, target)

    def forward(self, data, target, optim):
      out = self.train(data)

      loss = self.compute_loss(out, target) #shape (B)

      loss.mean(0).backward()
      optim.step()

      return loss


gpus = [0, 1, 2, 3]
model = nn.DataParallel(model, device_ids=gpus, dim=0)
# dim = 0 (batch size)
optim = Optim(optimer, learning_rate) 
# Optim class 


for data, target in rand_loader:
  data = data.cuda()
  target = target.cuda()
  model(data, target, optim)

I think using the DataParallel method to update weight will reduce the train time.
However,
how to use the average gradients to updates weight ?
I hope that I am clear.
Thanks!!!


#2

DataParallel doesn’t support the situation where you have optimizer inside the forward function.

Instead you have to keep forward to only compute and return the loss, and as part of training loop, you should do the lines:

loss = model(data, target)
loss.mean(0).backward()
optim.step()

#3

Thanks for your reply.
I already understand what you mean.
Thanks !!!