How does L1 Loss work? Grad_out not as expected

I am running a simple network calculating L1 loss for debugging. I’m tracking everything and numbers do not seem to line up.
network output:

(Pdb) output
Variable containing:
0.4068 0.4739
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]

Target output:

(Pdb) targets
Variable containing:
0
1
[torch.cuda.LongTensor of size 2 (GPU 0)]

But then from a register_backward_hook on my output layer I am getting

Variable containing:
(0 ,0 ,.,.) =
0.5000
(0 ,1 ,.,.) =
-0.5000
[torch.cuda.FloatTensor of size 1x2x1x1 (GPU 0)]

Are you sure you’re printing out the grad_output in your backward hook, and not the grad_input?

Yes, it is grad_output

Hmm heres a very interesting update. When I use MSELoss it looks like it does give the correct L1 Loss?

(Pdb) output
Variable containing:
0.4714 0.4087
[torch.cuda.FloatTensor of size 1x2 (GPU 0)]
(Pdb) currentD
Variable containing:
(0 ,0 ,.,.) =
0.4714
(0 ,1 ,.,.) =
-0.5913
[torch.cuda.FloatTensor of size 1x2x1x1 (GPU 0)]