First I’ll explain my goal,

Input: [b,*]

Now I want to calculate the gradients for a certain layer for each of the examples in the batch **separately**

Let’s say my model is a simple FC:

```
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.w = nn.Parameter(torch.rand(4, 2),requires_grad=True)
self.b = nn.Parameter(torch.rand(2),requires_grad=True)
def forward(self, x):
out = torch.mm(x,self.w)
out = out+self.b
return out
```

Calling the model and ciretrion pay attention that reduction is none:

```
resnet = Model()
criterion = nn.CrossEntropyLoss(reduction='none')
```

My input is a random 2x4 matrix, where 2 represent the batch:

```
rand = torch.rand(2,4)
```

Running:

```
output = resnet(rand)
```

Output shape is (2,2), first dimension is batch.

Declaring (random) target, criterion is ce:

```
target = torch.LongTensor([1,1])
loss = criterion(output,target)
optimizer.zero_grad()
```

Now, I want to take my losses that holds the shape of:[2], and calc the grad w.r.t of the bias layer.

```
a = torch.autograd.grad(loss,resnet.b)
```

This returned an error, so I tried:

```
a = torch.autograd.grad(loss,resnet.b,grad_outputs=[torch.ones_like(loss), torch.ones_like(loss)])
```

This accumulated the gradients…

I expect the return shape would be (2,2), means the batch size, and the layer shape.

Just to recap, I have the loss for each example in the batch seperatly using: “reduction =‘none’”

Now I want the gradients for each of the examples w.r.t to the bias layer, the same as if my batch was **1**.

So why I’m not running with batch=1, for efficency and speed of course.

Any idea how to make it work using:“torch.autograd.grad”?