I am a bit confused on how CrossEntropyLoss works or I am not using it properly.

I understand the mathematical formula CrossEntropyLoss and am trying to implement it for AlphaZero.

My issue is with the policy part of the loss function which uses CrossEntropyLoss portion. I am new to this, but when I use CrossEntropyLoss on the predicted policy’s probabilities with the labeled policy’s probabilities it doesn’t seem to give me a number near 0 when the model is close. Even when I give it an identical model tensor for input and target, it still gives me a non-zero larger number) so I am confused as to how this is helping me as a loss function.

Here is an example set of code where I am trying out CrossEntropyLoss and manually running CrossEntropyLoss through the sum of the multiplication of the policy with the log of the predictions

import torch

import torch.nn as nn

input = torch.FloatTensor([[.2,.3,.3,.2],[.3,.2,.3,.2],[0,0,.8,.2],[0.1,0.1,.6,.2]])

logInput = input.log()

cross_entropy_loss = nn.CrossEntropyLoss(reduce=False, reduction=“mean”)

output = cross_entropy_loss(input, input)

output2 = torch.sum(-input * torch.nan_to_num(input.log(), neginf=0.0), dim=1)

print('output: ', output)

print('output2: ', output2)

output: tensor([1.3775, 1.3775, 1.0151, 1.2390])

output2: tensor([1.3662, 1.3662, 0.5004, 1.0889])

When I use it on a larger tensor (for example I am using a tensor of length 4672) then it gets very wonky and the outputs end up around 8.4

Three questions:

- Am I doing something wrong in trying to implement cross entropy loss? Of course instead of running it against itself I would run it against predictions, but this was just to test the loss function
- If this is the correct output of CrossEntropyLoss, how will this help the AlphaGo algorithm learn?
- Why are the above two outputs different? I thought this was what the CrossEntropyLoss calculation was.

Thank you