Torch.nn.functional's softmax does not sum to 1

RylanSchaeffer · October 29, 2019, 9:49pm

I started receiving negative KL divergences between a target Dirichlet distribution and my model’s output Dirichlet distribution. Someone online suggested that this might be indicative that the parameters of the Dirichlet distribution don’t sum to 1. I thought this was ridiculous since the output of the model is passed through

output = F.softmax(self.weights(x), dim=1)

But after looking into it more closely, I found that torch.all(torch.sum(output, dim=1) == 1.) returns False! Looking at the problematic row, I see that it is tensor([0.0085, 0.9052, 0.0863], grad_fn=<SelectBackward>). But torch.sum(output[5]) == 1. produces tensor(False).

What am I misusing about softmax such that output probabilities do not sum to 1?

RylanSchaeffer · October 29, 2019, 9:50pm

I’m using version ‘1.2.0+cpu’

RylanSchaeffer · October 29, 2019, 9:54pm

Full model code:

import torch
import torch.nn as nn
import torch.nn.functional as F



def assert_no_nan_no_inf(x):
    assert not torch.isnan(x).any()
    assert not torch.isinf(x).any()


class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.weights = nn.Linear(
            in_features=2,
            out_features=3)

    def forward(self, x):
        output = F.softmax(self.weights(x), dim=1)
        assert torch.all(torch.sum(output, dim=1) == 1.)
        assert_no_nan_no_inf(x)
        return output

KFrank · October 29, 2019, 10:11pm

Hi Rylan!

You are testing for exact equality with a floating-point result.

Try looking at torch.sum(output[5]) - 1.0 and see if you
get numbers that are small, that is, on the order of floating-point
round-off error (about 10^-7, relative).

Best.

K. Frank