KL divergence loss

ptrblck · December 31, 2019, 3:34am

According to the docs:

As with NLLLoss , the input given is expected to contain log-probabilities and is not restricted to a 2D Tensor. The targets are given as probabilities (i.e. without taking the logarithm).

your code snippet looks alright. I would recommend to use log_softmax instead of softmax().log(), as the former approach is numerically more stable.