# Use max operation in loss function

hi, im very confused.
from all i know, mathematicly, max operation is not differential
some people are saying that it is not differential, and therefor cant be used in a loss function
and some are saying that it is differential “enough” for backpropagation

does someone have an answer and a good explanation about that issue?

Hello Neta!

Consider this one-dimensional (single-variable) function that
uses `max`:

`f (x) = max (x, 0)`

This function is differentiable for all values of `x` except when
`x = 0`. It is not differentiable exactly at `x = 0`, but the function
isn’t crazy. You could choose to define (not mathematically
correctly, though) to be `0` or `1` or `1/2` when `x = 0`, and for
practical purposes, for example, for back-propagation, do a
perfectly reasonable job. Most of the time you won’t be
back-propagating exactly through `x = 0`, and even if you do,
you probably won’t do so again on the next iteration.

Similarly, consider this function of two variables:

`f (x, y) = max (x, y)`

This function is differentiable (that is the gradients – the partial
derivatives with respect to `x` and `y`– are well defined) for all
values of `x` and `y` except along the line where `x = y`.

The same reasoning applies here. You usually won’t try to
back-propagate through a point where `x = y`, and even if
you do, using `0` or `1` or whatever for the partial derivative in
question will be good enough.

In practice we know – and lots of experience proves – that this
works.

(Now if autograd returned NaN or 10,000,000 or something
when you hit one of the rare points where `max` is not technically
uses some reasonable value like `0` or `1` or `1/2`, and everything