Catrogircal cross entropy with soft classes

John_Deterious · July 17, 2019, 8:22pm

I looked @ https://pytorch.org/docs/stable/_modules/torch/nn/modules/loss.html
And I failed to find what I wanted. I basically have multiclass problem, and the targets for each instance is a vector of soft classes.
To be concrete:
nueral net output [0.1, 0.5, 0.4]
correct label [0.2, 0.4, 0.4]
It should be standard in library.

I created my simple Python function to do it, but can anyone please help me wrap it as a proper pytorch loss function?

> def CXE(predicted, target):
>     return -(target * t.log(predicted)).sum(dim=1).mean()

This works for a minibatch. Can I use it in that form? or do I have to make a class out of it and add those things: @weak_script_method @weak_module and inherit from somehere?

KFrank · July 17, 2019, 10:24pm

Hello John!

I believe that you are correct. As far as I am aware, all of the
pre-packaged pytorch cross-entropy loss functions take class
labels for their targets, rather than probability distributions
across the classes.

Looking at your numbers, it appears that both your predictions
(neural-network output) and your targets (“correct label”) are
probability distributions across the classes. This makes sense.

(One imagines that your predictions are the output of a softmax
layer.)

I created my simple Python function to do it, but can anyone please help me wrap it as a proper pytorch loss function?
> def CXE(predicted, target):
>     return -(target * t.log(predicted)).sum(dim=1).mean()
This works for a minibatch. Can I use it in that form? or do I have to make a class out of it and add those things: @weak_script_method @weak_module and inherit from somehere?

You can use your CXE function just as it stands. There is no
requirement (or even any particular reason) to wrap it some kind
of “official” pytorch loss function. Because you are using standard
pytorch tensor functions to do your calculations, autograd will
work correctly without you doing anything special.

So you should be able to run something like:

opt.zero_grad()
predicted = model (input)
loss = CXE (predicted, target)
loss.backward()
opt.step()

(Here, input, predicted, and target are understood as being
for a batch of samples.)

Good luck.

K. Frank

John_Deterious · July 17, 2019, 10:27pm

That is the most well-written and thorough answer I’ve seen in a while. Hats off. Many thanks Frank.