Torch.autograd.grad - calculate grad for each example in batch seperatly

First I’ll explain my goal,
Input: [b,*]
Now I want to calculate the gradients for a certain layer for each of the examples in the batch separately
Let’s say my model is a simple FC:

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()
        self.w = nn.Parameter(torch.rand(4, 2),requires_grad=True)
        self.b = nn.Parameter(torch.rand(2),requires_grad=True)
    def forward(self, x):
        out =,self.w)
        out = out+self.b
        return out

Calling the model and ciretrion pay attention that reduction is none:

resnet = Model()
criterion = nn.CrossEntropyLoss(reduction='none')

My input is a random 2x4 matrix, where 2 represent the batch:

rand = torch.rand(2,4)


output = resnet(rand)

Output shape is (2,2), first dimension is batch.
Declaring (random) target, criterion is ce:

target = torch.LongTensor([1,1])
loss = criterion(output,target)

Now, I want to take my losses that holds the shape of:[2], and calc the grad w.r.t of the bias layer.

a = torch.autograd.grad(loss,resnet.b)

This returned an error, so I tried:

a = torch.autograd.grad(loss,resnet.b,grad_outputs=[torch.ones_like(loss), torch.ones_like(loss)])

This accumulated the gradients…
I expect the return shape would be (2,2), means the batch size, and the layer shape.

Just to recap, I have the loss for each example in the batch seperatly using: “reduction =‘none’”
Now I want the gradients for each of the examples w.r.t to the bias layer, the same as if my batch was 1.
So why I’m not running with batch=1, for efficency and speed of course.

Any idea how to make it work using:“torch.autograd.grad”?


Automatic differentiation can only do vector jacobian product. So the grad_outputs that you give are multiplied by the jacobian. Since you give a vector full on ones, it sums up the gradients.

The two main ways to be able to get a gradient for each of your loss are:

  • Do one backward for each of them and store the gradients.
  • Expand your weights to the right batch size before the forward so that the regular backward will give you what you want.

Hi @albanD

How to expand weights (inside the model) to the batch size? Is there any example?


I’m not sure we have examples no.

But maybe you can check repos like GitHub - cybertronai/autograd-hacks that provide such features.