KL Divergence produces negative values

Did you normalized values with log_softmax?

torch.nn.KLDivLoss(size_average=False)(F.log_softmax(scores, -1), targets)
1 Like