KL Divergence produces negative values

Same. I created a post over here since my negative KL divergence values might be caused by the output of torch.nn.functional.softmax producing vectors that don’t sum to 1.