Need Not Summing Up Gradients

I was wondering is there a way to get a separate gradient for each data point? The default setting in that it sums up all the gradients in var.grad, but I would like to have the grad values before summing them up. In the next example, I have two data point, and I need to have two sets of gradients corresponding to each row of the batch.

class MnistFC(nn.Module):
  def __init__(self):
    super(MnistFC, self).__init__()
    self.fc  = nn.Linear(28*28, 10)

  def forward(self, x):
    x = x.view(-1, 28*28)
    x = self.fc (x)
    return x
net = MnistFC()
x = torch.randn([2,784])

out = net(x)
torch.autograd.grad(out,net.parameters(),torch.ones([2,10]), retain_graph=True)

For better clarification, I want to have net.fc.weight.grad with shape [2x784x10] instead of [784x10]. It is also ok to have a list with size two, each element with shape [784x10]. Either way is fine. One way to do this is with a for loop over every single line of data, but this method is not efficient. I am looking for an efficient way to create grads with just one autograd.grad call.


Hello Reza,

How about change argument reduction in the loss function you employed?

No, it does not solve my issue. The reduce parameter only tells that the output of loss function need not be averaged. It does nothing with the gradients. I need to have a separate gradient for every data point.