CrossEntropyLoss backpropagation

Sam-gege · September 29, 2021, 5:48am

hi, according to the doc, when it says " This criterion combines LogSoftmax and NLLLoss in one single class." Does it mean to simply connect these two modules, i.e. connect the output of LogSoftmax to the input of NLLLoss?

I’d like to ask this because I learnt that when combining these two modules, the backpropagation may be simplified. For example, if the input is x1,x2, their softmax is s1, s2, and output is y1, y2, then dLoss/dx1=s1(y1+y2)-y1. y1+y2 is just 1 so it can be cancelled so we won’t need to construct a Jacobian because there’s no contribution from y2. However, when there’s weight, i.e. w1y1+w2y2, or ignore_index, i.e. 0*y1+y2, y2 can’t be cancelled out, so there’s no difference if we simply connect those two modules together? (not sure, my speculation).

I don’t understand C so I can’t confirm it, could someone help me on this? I’m a new learner and trying to implement those modules myself, so far I’ve implemented logsoftmax and NLLLoss, It would be easy if CrossEntropyLoss simply connect them together Thanks for any help.

Sam-gege · September 29, 2021, 8:17am

Oh never mind, they simply linked together. I thought they are not written in python as others.
so in nn/functional.py it has:
return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)