How to calculate L1 loss between two variables?

sunshineatnoon · April 25, 2017, 1:48am

I would like to calculate the L1 loss between two variables. For instance, network A outputs a, network B outputs b, and I would like a and b to be as close as possible, so the criterion will be L1Loss(a,b). But in Pytorch the target of L1 loss can’t be variable (no gradients), I received this error:

AssertionError: nn criterions don't compute the gradient w.r.t. targets - please mark these variables as volatile or not requiring gradients
What should I do to calculate the L1 Loss between two variables?

vabh · April 25, 2017, 11:40am

Maybe try nn.PairwiseDistance witht p=1?
http://pytorch.org/docs/nn.html#torch.nn.PairwiseDistance

EDIT: Your loss function could be:

loss = torch.abs(outA - outB)
#take the average over the batch
loss = loss.sum() / batch_size

sunshineatnoon · April 26, 2017, 11:30am

Thanks. I solved this problem by:

err = torch.mean(torch.abs(err_a - err_b))
err.backward( )

bzcheeseman · April 26, 2017, 3:55pm

So this can actually be a bit strange. I think your method will work better since you need to train two networks at once (presumably)

When you call the loss function, the first Variable can require gradient, the second cannot. The second argument also doesn’t have to be volatile. They can both be Variables, you just have to mark the second as not requiring gradient. For example, this code (should) work.

self.criterion = nn.L1Loss()
self.delta = Variable(torch.FloatTensor([0]), requires_grad=True)
next_layer_grad = Variable(torch.FloatTensor([0]))
loss = self.criterion(self.delta, next_layer_grad)

If you wanted to do it this way you could clone one output and mark it as not requiring gradient, take L1Loss, then swap inputs and do it again.

smtak · October 13, 2017, 2:23pm

@bzcheeseman, your reply seems to be the proper PyTorch way to do. Could you please elaborate it?

cbarrick · November 13, 2017, 2:44am

I’m not @bzcheeseman, but I think I can elaborate.

The output of your network will have requires_grad because we need to backprop through it, but your targets should not have requires_grad because we don’t need to backprop through them. Since the first argument can require_grad but the second argument cannot, that means the output of the network should always come first.

In other words, this is ok:

loss(model_output, targets)

but this is not:

loss(targets, model_output)

EDIT: This is the opposite argument order of the functions in sklearn.metrics which makes it really annoying to write code that mixes both.