First I’ll explain my goal,
Input: [b,*]
Now I want to calculate the gradients for a certain layer for each of the examples in the batch separately
Let’s say my model is a simple FC:
class Model(nn.Module):
def __init__(self):
super(Model, self).__init__()
self.w = nn.Parameter(torch.rand(4, 2),requires_grad=True)
self.b = nn.Parameter(torch.rand(2),requires_grad=True)
def forward(self, x):
out = torch.mm(x,self.w)
out = out+self.b
return out
Calling the model and ciretrion pay attention that reduction is none:
resnet = Model()
criterion = nn.CrossEntropyLoss(reduction='none')
My input is a random 2x4 matrix, where 2 represent the batch:
rand = torch.rand(2,4)
Running:
output = resnet(rand)
Output shape is (2,2), first dimension is batch.
Declaring (random) target, criterion is ce:
target = torch.LongTensor([1,1])
loss = criterion(output,target)
optimizer.zero_grad()
Now, I want to take my losses that holds the shape of:[2], and calc the grad w.r.t of the bias layer.
a = torch.autograd.grad(loss,resnet.b)
This returned an error, so I tried:
a = torch.autograd.grad(loss,resnet.b,grad_outputs=[torch.ones_like(loss), torch.ones_like(loss)])
This accumulated the gradients…
I expect the return shape would be (2,2), means the batch size, and the layer shape.
Just to recap, I have the loss for each example in the batch seperatly using: “reduction =‘none’”
Now I want the gradients for each of the examples w.r.t to the bias layer, the same as if my batch was 1.
So why I’m not running with batch=1, for efficency and speed of course.
Any idea how to make it work using:“torch.autograd.grad”?