How can be the gradient values for each input of batch accessed?


I am working to get the gradient values for each input from the batch simultaneously.

More specifically, I need the mean of squared gradients of inputs from the batch.

I think it is possible to get it if the number of GPUs are the same as that of batch size, because each GPU calculates each input of the batch.

Here is the simple script.

import numpy as np
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.nn.functional as F

input = Variable(torch.randn(4, 10).cuda())
target = Variable(torch.ones(4).long().cuda())

class dp(nn.Module):
    def __init__(self):
        super(dp, self).__init__()
        self.n1 = nn.Linear(10,10)
        self.n2 = nn.Linear(10,2)
    def forward(self, x):
        x = self.n1(x)
        x = F.log_softmax(self.n2(x),dim=1)
        return x

dp = dp().cuda()
dp = torch.nn.DataParallel(dp, device_ids=[0, 1, 2, 3])
output = dp(input)
loss = F.nll_loss(output, target, reduction='none')
torch.autograd.backward([element for element in loss])

As you can see, 4 GPUs are employed and the batch size is 4.
So, each GPU evaluates the gradient of each input.

My question is how the gradient values of each input can be accessed.

Thanks in advance for your help.

Well, as no smarter person is replaying I will.

You can probably use register backward hooks. Each time a the hooked layer gets backwarded you will get an output.

The problem is it will be called as many times as gpus you have and I have not hints about how to deal with that.

Thanks for your reply.

I will try as you suggested.

Still no smarter person than @JuanFMontesinos replying, but I’m gray-haired enough to recommend Efficient computation of per-sample examples and the discussion in issue #15359 for more inspiration.

Best regards