Does backward happen on Master GPU or each GPU

BobChen · March 2, 2020, 8:16am

When use torch.nn.DataParallel to do multi-gpu training, does loss.backward() only calculate the grad towards the model on the master gpu or calculate the grad on each gpu and then merge them.

tom · March 2, 2020, 9:17am

The latter.
In particular, the intermediate results needed for backward are typically on the GPU where you did the forward.

Best regards

Thomas

BobChen · March 3, 2020, 2:40am

Got it, thanks for your reply:)