First I’ll explain my goal,
Now I want to calculate the gradients for a certain layer for each of the examples in the batch separately
Let’s say my model is a simple FC:
class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.w = nn.Parameter(torch.rand(4, 2),requires_grad=True) self.b = nn.Parameter(torch.rand(2),requires_grad=True) def forward(self, x): out = torch.mm(x,self.w) out = out+self.b return out
Calling the model and ciretrion pay attention that reduction is none:
resnet = Model() criterion = nn.CrossEntropyLoss(reduction='none')
My input is a random 2x4 matrix, where 2 represent the batch:
rand = torch.rand(2,4)
output = resnet(rand)
Output shape is (2,2), first dimension is batch.
Declaring (random) target, criterion is ce:
target = torch.LongTensor([1,1]) loss = criterion(output,target) optimizer.zero_grad()
Now, I want to take my losses that holds the shape of:, and calc the grad w.r.t of the bias layer.
a = torch.autograd.grad(loss,resnet.b)
This returned an error, so I tried:
a = torch.autograd.grad(loss,resnet.b,grad_outputs=[torch.ones_like(loss), torch.ones_like(loss)])
This accumulated the gradients…
I expect the return shape would be (2,2), means the batch size, and the layer shape.
Just to recap, I have the loss for each example in the batch seperatly using: “reduction =‘none’”
Now I want the gradients for each of the examples w.r.t to the bias layer, the same as if my batch was 1.
So why I’m not running with batch=1, for efficency and speed of course.
Any idea how to make it work using:“torch.autograd.grad”?