hyuntae
(Hyuntae Choi)
#1
Hi.

I have a short question about “torch.nn.L1Loss()”.

As we know, it is not possible to differentiate |x| when x is zero (because the derivative of |x| is x/|x|).

However, when I used torch.nn.L1Loss() with network, loss.backward() worked…

How is it possible?

Also, would you explain how it is possible to calculate gradient of L1 Loss in pytorch when x is zero?

Thanks.

1 Like

spanev
(Serge Panev)
#2
What is `x`

in your example? Is it the input of the model?

Let’s say `x`

your input (with `requires_grad=True`

) and the first layer of the model in a Linear with no bias, parameters `w`

, doing `y = w*x`

.

The last step of the backprop would be `dL/dx = dy/dx * dL/dy`

and `dy/dx = w`

, no matter what is the value of `x`

.

hyuntae
(Hyuntae Choi)
#3
Thanks for reply. And sorry for confusing.

x is just a variable of the equation ‘y = |x|’, not the input or output of network.

I just wonder when try to differentiate the y = |x|, how the torch handle when x is 0 although it is impossible to differentiate at that point.

spanev
(Serge Panev)
#4
Ok, my bad, I mistook the pipes in `|x|`

for `l`

.

I think it uses an approximate gradient on 0. And as you can see here, it considers 0 as a positive input.

1 Like

hyuntae
(Hyuntae Choi)
#5
Thank you very much!

Then, I need to find the implementation of approximate.

Thanks!