Nikronic
(Nikan Doosti)
September 26, 2019, 9:24pm
2
Hi,
I have not read the distribution
package source code, but from what I know from the C++ source code, I prefer using torch.nn.functional.kl_div function to calculate the divergence.
return apply_loss_reduction(output, reduction);
}
Tensor margin_ranking_loss(const Tensor& input1, const Tensor& input2, const Tensor& target, double margin, int64_t reduction) {
auto output = (-target * (input1 - input2) + margin).clamp_min_(0);
return apply_loss_reduction(output, reduction);
}
Tensor kl_div(const Tensor& input, const Tensor& target, int64_t reduction) {
auto zeros = at::zeros_like(target);
auto output_pos = target * (at::log(target) - input);
auto output = at::where(target > 0, output_pos, zeros);
return apply_loss_reduction(output, reduction);
}
Tensor kl_div_backward_cpu(const Tensor& grad, const Tensor& input, const Tensor& target, int64_t reduction) {
auto grad_input = at::zeros_like(input);
auto grad_expand = grad.expand_as(input);
AT_DISPATCH_FLOATING_TYPES(input.scalar_type(), "kl_div_backward_cpu", [&]() {
at::CPU_tensor_apply3<scalar_t, scalar_t, scalar_t>(
grad_input,
Based on the source code, you should provide log_probs
for the target
.
Notice that PyTorch use kl_div
like this: kl_div(b, a)
for kl_div(a||b)
, so it means you need to use following code to get the same result as Tensorflow.
preds_torch = torch.Tensor(preds)
labels_torch = torch.Tensor(labels)
out = F.kl_div(labels_torch.log(), preds_torch, reduction='sum')
print(out.item()) #0.2038460671901703
Also, it is equivalent to:
out = (preds_torch * (preds_torch / labels_torch).log()).sum()
print(out.item())
In the end, I am really not sure about distribution
package yet. I will check it out and let you know if you are interested.
Further reading:
opened 01:35PM - 07 May 18 UTC
closed 03:50PM - 07 May 18 UTC
Does F.kl_div(a, b) mean KL(b||a), not KL(a||b) ?
Good luck
Nik
1 Like