Backward of crossentropyloss

Martina_Ragulikova · January 7, 2023, 11:19am

Hi thank you, lets think about MSE loss:
This is forward:

((labels - prediction) ** 2).mean()

And this is backward:

N = labels.shape[0]
first_grad = -2*(labels - prediction) / N

First grad is then used to backpropagate next layer and get gradients of weights and bias.

The question is how to do the same with cross entropy loss?
This is forward of it:


ce = -labels*(y_pred - y_pred.exp().sum(1).log().unsqueeze(1))
result = ce.sum(dim=1).mean()

How to do backward from this?