Does backward happen on Master GPU or each GPU

When use torch.nn.DataParallel to do multi-gpu training, does loss.backward() only calculate the grad towards the model on the master gpu or calculate the grad on each gpu and then merge them.

The latter.
In particular, the intermediate results needed for backward are typically on the GPU where you did the forward.

Best regards

Thomas

1 Like

Got it, thanks for your reply:)