Misunderstanding on CrossEntropyLoss

sardanian · February 1, 2022, 11:39pm

I am a bit confused on how CrossEntropyLoss works or I am not using it properly.

I understand the mathematical formula CrossEntropyLoss and am trying to implement it for AlphaZero.

My issue is with the policy part of the loss function which uses CrossEntropyLoss portion. I am new to this, but when I use CrossEntropyLoss on the predicted policy’s probabilities with the labeled policy’s probabilities it doesn’t seem to give me a number near 0 when the model is close. Even when I give it an identical model tensor for input and target, it still gives me a non-zero larger number) so I am confused as to how this is helping me as a loss function.

Here is an example set of code where I am trying out CrossEntropyLoss and manually running CrossEntropyLoss through the sum of the multiplication of the policy with the log of the predictions

import torch
import torch.nn as nn

input = torch.FloatTensor([[.2,.3,.3,.2],[.3,.2,.3,.2],[0,0,.8,.2],[0.1,0.1,.6,.2]])
logInput = input.log()
cross_entropy_loss = nn.CrossEntropyLoss(reduce=False, reduction=“mean”)
output = cross_entropy_loss(input, input)
output2 = torch.sum(-input * torch.nan_to_num(input.log(), neginf=0.0), dim=1)

print('output: ', output)
print('output2: ', output2)

output: tensor([1.3775, 1.3775, 1.0151, 1.2390])
output2: tensor([1.3662, 1.3662, 0.5004, 1.0889])

When I use it on a larger tensor (for example I am using a tensor of length 4672) then it gets very wonky and the outputs end up around 8.4

Three questions:

Am I doing something wrong in trying to implement cross entropy loss? Of course instead of running it against itself I would run it against predictions, but this was just to test the loss function
If this is the correct output of CrossEntropyLoss, how will this help the AlphaGo algorithm learn?
Why are the above two outputs different? I thought this was what the CrossEntropyLoss calculation was.

Thank you

mMagmer · February 2, 2022, 8:19am

corssentropy gets model score (-∞,+∞) not probabilitis.
it combines logsoftmax and NLL.