DataParallel method can split the big batch into small batch,
then run on different GPUs.
I don’t know how to use the DataParallel method to complete the gradient calculation and weight updates.
And,
the weight updates dependent on the average gradients from the different gpus.
My specific code:
class Model(nn.Module):
def train(self, input):
output = self.fc(input)
return output
def compute_loss(self, output, target):
return Loss_compute(output, target)
def forward(self, data, target, optim):
out = self.train(data)
loss = self.compute_loss(out, target) #shape (B)
loss.mean(0).backward()
optim.step()
return loss
gpus = [0, 1, 2, 3]
model = nn.DataParallel(model, device_ids=gpus, dim=0)
# dim = 0 (batch size)
optim = Optim(optimer, learning_rate)
# Optim class
for data, target in rand_loader:
data = data.cuda()
target = target.cuda()
model(data, target, optim)
I think using the DataParallel method to update weight will reduce the train time.
However,
how to use the average gradients to updates weight ?
I hope that I am clear.
Thanks!!!