I would like to calculate the L1 loss between two variables. For instance, network A outputs a, network B outputs b, and I would like a and b to be as close as possible, so the criterion will be L1Loss(a,b). But in Pytorch the target of L1 loss can’t be variable (no gradients), I received this error:
AssertionError: nn criterions don't compute the gradient w.r.t. targets - please mark these variables as volatile or not requiring gradients
What should I do to calculate the L1 Loss between two variables?
EDIT: Your loss function could be:
loss = torch.abs(outA - outB)
#take the average over the batch
loss = loss.sum() / batch_size
Thanks. I solved this problem by:
err = torch.mean(torch.abs(err_a - err_b))
So this can actually be a bit strange. I think your method will work better since you need to train two networks at once (presumably)
When you call the loss function, the first Variable can require gradient, the second cannot. The second argument also doesn’t have to be volatile. They can both be Variables, you just have to mark the second as not requiring gradient. For example, this code (should) work.
self.criterion = nn.L1Loss()
self.delta = Variable(torch.FloatTensor(), requires_grad=True)
next_layer_grad = Variable(torch.FloatTensor())
loss = self.criterion(self.delta, next_layer_grad)
If you wanted to do it this way you could clone one output and mark it as not requiring gradient, take L1Loss, then swap inputs and do it again.
@bzcheeseman, your reply seems to be the proper PyTorch way to do. Could you please elaborate it?
I’m not @bzcheeseman, but I think I can elaborate.
The output of your network will have
requires_grad because we need to backprop through it, but your targets should not have
requires_grad because we don’t need to backprop through them. Since the first argument can
require_grad but the second argument cannot, that means the output of the network should always come first.
In other words, this is ok:
but this is not:
EDIT: This is the opposite argument order of the functions in
sklearn.metrics which makes it really annoying to write code that mixes both.