# How can custom loss function be backpropagated

I built my loss function using conditional statement like:

``````def myloss(data):
if blah blah:
loss = blah
if blah blah:
loss = blah
return loss
loss = myloss(output)
loss.backward()
``````

I worried it won’t work but it worked.
but how can my loss function be backpropagated?
Is my loss function differentiable?

Thank you for helping me in advance.

Hello Hwarang!

In short, the conditional statement doesn’t break anything.

If `loss` inside of your `myloss()` function is calculated with pytorch
tensor operations (that have `backward()` implemented and are
differentiable), backpropagation through `myloss()` will work just fine.

So, to be concrete, let:

``````def myloss (data):
if data > 5.0:
loss = 1.0 * (data**2).sum()
else:
loss = 2.0 * (data**3).sum()
return loss
``````

Mathematically speaking, `myloss()` will be differentiable everywhere
except at `data = 5.0`, which is good enough.

In practice, if `data = 5.0`, `myloss()` will take the second
branch and `loss.backward()` will calculate the gradient that
corresponds to `loss = 2.0 * (data**3).sum()`.

Best.

K. Frank

I`m sorry but i think that i asked wrong question which is not my intention.
Actually, my loss function is constructed like:

``````def myloss(data):
tmp = 0
for i in range(len(data)):
if data[i] > 0.5:
tmp += 1
else:
tmp += 2
loss = math.log10(tmp)
return loss
loss = myloss(output)
loss.backward()
``````

In this case, does it work too?

Hello Hwarang!

No, this won’t work. The problem is that this version of `myloss()` isn’t
usefully differentiable. It is constant almost everywhere, so the gradient
will always be zero.

Mathematically, `myloss()` is differentiable (with zero gradient) except
when any of the `data[i] = 0.5`, at which values `myloss()` jumps
discontinuously and the derivative is not defined.

Numerically with pytorch you will always get zero gradient, even when
some `data[i] = 0.5`, because whatever branch of the conditional
you go through, a constant function (constant for that branch) is being
calculated.

`myloss()` and backpropagation will “work” in the sense that calling
`loss.backward()` will give you a well-defined gradient, but it doesn’t
actually do you any good because the gradient is always zero.

In practical terms, let’s say that `data = 0.5001`, and you get some
value of the loss function. Let’s also say that at `data = 0.5` the
loss function jumps to a lower, more favorable value so that you would
like your optimizer step to update `data` to `0.4999`. The problem is
that the optimizer only knows about the gradient, which is zero, and
doesn’t know that very nearby at `0.4999` you get a lower loss. With zero
gradient the optimizer doesn’t (and can’t) know in which direction to vary
`data`, that is, whether to increase, decrease, or leave unchanged
`data`, to get to a lower loss.

This is how gradient-descent optimization methods (which are the core
of pytorch’s backpropagation) work, and it’s an inherent limitation they
have.

Best.

K. Frank