Why doesn't this custom softmax negative loglikelihood work?

I have realized that the negative log likelihood function in pytorch has a reduction argument which can be used instead of my method below but I still cannot justify why this doesn’t work when it looks like it should.

In this case y was a vector with all of the integer class labels in it. I verified that the answers were very close to what I would get from the functional.nll_loss in torch, but the model failed to learn in this case so there must have been some problem which didn’t cause an error, but IDK what it is. Can anyone see why this didn’t work?

loss = -torch.log(F.softmax(logits, dim=1)[:, y])
loss.mean().backward()
optimizer.step()

You would get approx. the same results assuming your indexing is right (I had to use [torch.arange(y.size(0), y] if y has the shape [batch_size]).
However, you might run into numerical stability issues, as you are applying torch.log and F.softmax separately.

You could try to use loss = -F.log_softmax(logits, dim=1)[torch.arange(y.size(0)), y] instead and compare your results with the unreduced reference loss.

I see thanks, I guess I didn’t notice a mistake in my indexing which caused it to get random gradients and not learn anything. Thanks!