KL divergence different results from tf

Nikronic · September 26, 2019, 9:24pm

Hi,
I have not read the distribution package source code, but from what I know from the C++ source code, I prefer using torch.nn.functional.kl_div function to calculate the divergence.

github.com

pytorch/pytorch/blob/35fed93b1ef05175143f883c6f89f06c6dd9429b/aten/src/ATen/native/Loss.cpp#L71


      
            return apply_loss_reduction(output, reduction);
          }
          
          Tensor margin_ranking_loss(const Tensor& input1, const Tensor& input2, const Tensor& target, double margin, int64_t reduction) {
            auto output =  (-target * (input1 - input2) + margin).clamp_min_(0);
            return apply_loss_reduction(output, reduction);
          }
          
          Tensor kl_div(const Tensor& input, const Tensor& target, int64_t reduction) {
            auto zeros = at::zeros_like(target);
            auto output_pos = target * (at::log(target) - input);
            auto output = at::where(target > 0, output_pos, zeros);
            return apply_loss_reduction(output, reduction);
          }
          
          Tensor kl_div_backward_cpu(const Tensor& grad, const Tensor& input, const Tensor& target, int64_t reduction) {
            auto grad_input = at::zeros_like(input);
            auto grad_expand = grad.expand_as(input);
            AT_DISPATCH_FLOATING_TYPES(input.scalar_type(), "kl_div_backward_cpu", [&]() {
              at::CPU_tensor_apply3<scalar_t, scalar_t, scalar_t>(
                  grad_input,

Based on the source code, you should provide log_probs for the target.
Notice that PyTorch use kl_div like this: kl_div(b, a) for kl_div(a||b), so it means you need to use following code to get the same result as Tensorflow.

preds_torch = torch.Tensor(preds)
labels_torch = torch.Tensor(labels)
out = F.kl_div(labels_torch.log(), preds_torch, reduction='sum')
print(out.item())  #0.2038460671901703

Also, it is equivalent to:

out = (preds_torch * (preds_torch / labels_torch).log()).sum()
print(out.item())

In the end, I am really not sure about distribution package yet. I will check it out and let you know if you are interested.