I am a bit confused on how CrossEntropyLoss works or I am not using it properly.
I understand the mathematical formula CrossEntropyLoss and am trying to implement it for AlphaZero.
My issue is with the policy part of the loss function which uses CrossEntropyLoss portion. I am new to this, but when I use CrossEntropyLoss on the predicted policy’s probabilities with the labeled policy’s probabilities it doesn’t seem to give me a number near 0 when the model is close. Even when I give it an identical model tensor for input and target, it still gives me a non-zero larger number) so I am confused as to how this is helping me as a loss function.
Here is an example set of code where I am trying out CrossEntropyLoss and manually running CrossEntropyLoss through the sum of the multiplication of the policy with the log of the predictions
import torch
import torch.nn as nn
input = torch.FloatTensor([[.2,.3,.3,.2],[.3,.2,.3,.2],[0,0,.8,.2],[0.1,0.1,.6,.2]])
logInput = input.log()
cross_entropy_loss = nn.CrossEntropyLoss(reduce=False, reduction=“mean”)
output = cross_entropy_loss(input, input)
output2 = torch.sum(-input * torch.nan_to_num(input.log(), neginf=0.0), dim=1)
print('output: ', output)
print('output2: ', output2)
output: tensor([1.3775, 1.3775, 1.0151, 1.2390])
output2: tensor([1.3662, 1.3662, 0.5004, 1.0889])
When I use it on a larger tensor (for example I am using a tensor of length 4672) then it gets very wonky and the outputs end up around 8.4
Three questions:
- Am I doing something wrong in trying to implement cross entropy loss? Of course instead of running it against itself I would run it against predictions, but this was just to test the loss function
- If this is the correct output of CrossEntropyLoss, how will this help the AlphaGo algorithm learn?
- Why are the above two outputs different? I thought this was what the CrossEntropyLoss calculation was.
Thank you