How to Get Each Model Replica's Gradient

cbats · November 8, 2017, 10:11am

Hi,

After I call loss.backward(), PyTorch can auto calculate the average gradients of all model replicas(towers) when I use torch.nn.DataParallel(). Is there any way to get each model replica(tower)'s gradient?

Example::
>>> net = torch.nn.DataParallel(model, device_ids=[0, 1, 2])
>>> output = net(input_var)
>>> loss = criterion(output, label_var)
>>> loss.backward()
"""

Thanks

cbats · November 8, 2017, 11:50am

Thank you all, I got it by specialized the Gather function.

Btw, I wonder why replicate function will be iterative call when DataParallel Module forward rather than call once when init? https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/data_parallel.py#L67

cbats · November 10, 2017, 7:05am

Sorry, I made a mistake. Actually I solve it by specialized the Broadcast’s backward function(which is a reduce operation).

Thanks.

wish · February 26, 2019, 7:32pm

I also met with this problem. I thought it is the Scatter::backward() gathers the grads until this post remind me it actually is Broadcast::backward().
Thanks very much:)

cruzas2 · November 4, 2022, 4:22pm

Hello!

I am interested in something similar. In particular, I am interested in accessing the replicated model parameters. Would you be able to provide a code snippet to show how you were able to access the replicated model gradients?