KL divergence loss

According to the docs:

As with NLLLoss , the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).

your code snippet looks alright. I would recommend to use log_softmax instead of softmax().log(), as the former approach is numerically more stable.

1 Like