Exponential Loss Function

I was looking through the documentation and I was not able to find the standard exponential loss function. Is there a simple way to implement my own exponential function.

Thank you in advance!

Hello Maria!

Sure. And if you use normal tensor operations, autograd
will work for you.

Let’s say you have a binary classification problem and
logits are your predictions for a batch (running from
-infinity to +infinity) and labels are your known class
labels for that batch (equal to 0 or 1).

Then:

import torch

def myExpLoss (logits, labels):
    return  (((2.0 * labels.float() - 1.0) * logits).exp()).mean()

logits = torch.tensor ([-0.2, 1.3, .3], requires_grad = True)
labels = torch.tensor ([0, 0, 1])

myExpLoss (logits, labels)

[Edit: The above expression for myExpLoss is missing a minus
sign, and this causes the loss to incorrectly increase for more
accurate predictions. I haven’t edited in the correction above,
but the correct expression appears in my next post, below.]

This yields:

>>> myExpLoss (logits, labels)
tensor(0.9479, grad_fn=<MeanBackward1>)

so you can see that autograd is working.

[Edit: This loss value, 0.9479, is less than 1, incorrectly indicating
the the predictions are more right than wrong. Although the first
and last predictions are right, they are only mildly right, while
the second prediction is strongly wrong, making the predictions
overall more wrong than right. The correct value of the loss (given
by the corrected expression in my next post) is 1.7429.]

This version returns the per-sample exponential loss averaged
over the batch.

As an aside, you might not want to use an exponential loss.
Depending on your problem, some other loss function might
work better.

Good luck.

K. Frank

1 Like

Thank you so much! This is so helpful!

Hi I’m sorry is this has an obvious answer, but I’m struggling a bit understanding the exponential loss function.

Why do you multiply the labels times 2.0 and subtract 1.0

Thank you so much in advance!

Hi Maria!

In short, 2 * label - 1 maps a label equal to the values {0, 1}
to the values {-1, 1}, to better match your output (the “logit”)
running from -infinity to +infinity.

Some more details:

First, I don’t know if there is one single standard definition of
exponential loss function. (There might be; I just don’t know.)

Anyway, I think of it as being:

   exp (mismatch between prediction and actual),

where exp (mismatch) is used to strongly amplify being wrong.

In the case of a binary classification problem, it is often convenient
to have (and you often do have) numerical class labels where
label = 0 indicates the “negative” class and label = 1
indicates the “positive” class.

To be concrete, let’s say that the input to your neural-network
model is 65,536 floating-point grayscale values that represent
the pixels in a 256x256 grayscale image. Your model has a
single floating-point output. Your known class labels are 0 and
1, where 1 indicates that the image is a picture of a bird, and
0 indicates that it is not a bird. (“Not a bird” means that it might
be a picture of nothing, or it might be a picture of something
other than a bird, but it’s not a bird.)

The number-one rule is that your output means whatever you
train it to mean, but we will understand your output as follows:
A large positive number means that your model really thinks
the image is a bird; a moderate positive number means probably
a bird; 0 means your model has no opinion – it might be a bird,
but, equally likely, might not; a negative number means it probably
isn’t a bird. Algebraically larger (more positive or less negative)
means it’s more likely to be a bird, and algebraically smaller (more
negative or less positive) means less likely.

Now to your specific question:

Your label values are 0 and 1. 2 * label - 1 maps these two
values to -1 and 1. When label = 0, exp (label * output)
would always be 1, regardless of the value of output. So, for a
non-bird image (label = 0), your model would not be penalized
for returning a very large, positive output – “Yes, it’s really a bird” –
nor rewarded for returning a negative output – “No, it’s not a bird.”
Mapping {0, 1} to {-1, 1} fixes this.

If you have a sharp eye, you should now notice that I missed a
minus sign in the loss function I posted above. It should actually
be:

def myExpLoss (logits, labels):
    return  (( -( (2.0 * labels.float() - 1.0) * logits ) ).exp()).mean()

With this minus sign, this loss penalizes (loss > 1) for being
wrong, and rewards (loss < 1) for being right. (A bigger – more
positive – loss means your model is getting things wrong; a
smaller loss means your model is getting things right.)

Good luck.

K. Frank

1 Like

This makes perfect sense now, thank you so much!

Hi Frank!

“A large positive number means that your model really thinks
the image is a bird; a moderate positive number means probably
a bird; 0 means your model has no opinion – it might be a bird,
but, equally likely, might not; a negative number means it probably
isn’t a bird. Algebraically larger (more positive or less negative)
means it’s more likely to be a bird, and algebraically smaller (more
negative or less positive) means less likely.”

Does this acturally also the explanation of a model with last node activated by Tanh()?

Thanks!

Hi ZN!

Let me say again what I said at the beginning of the paragraph
you quoted from:

“The number-one rule is that your output means whatever you
train it to mean …”

So, yes, you can have the output of your model pass through
a final tanh(). And if you then train your model so that
algebraically larger values mean that your sample is more
likely to be a bird, then, yes, that is what the output of your
model will mean.

Best.

K. Frank

Many thanks for your reply!

Cheers!