Problem using softmax

here is the Variable before softmax


after softmax, the biggest one of them became 1 and others became 0

my forward function

def forward(self, x):
    x = F.relu(self.conv1(x))
    x = F.relu(self.conv2(x))
    x = F.relu(self.conv3(x))
    x = F.relu(self.conv4(x))
    x = F.relu(self.conv5(x))
    x = F.relu(self.conv6(x))
    x = F.relu(self.conv7(x))
    x = x.view(x.size(0), -1)
    x = F.softmax(x)
    return x

is there something wrong in my usage of softmax?

This is to be expected. exp(-20) is about 2e-9, so if your largest input to softmax is 20 larger (in absolute numbers) than the others, the softmax will be extremely spiked. (And the softmax is saturated, i.e. you have vanishing gradients, see e.g. http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/ for lots of details.)
The canonical way to overcome this is to use a temperature parameter to divide the inputs (see e.g. https://en.wikipedia.org/wiki/Softmax_function#Reinforcement_learning ), for your example, the maximal difference seems to be ~150, so you could try an initial temperature of ~200 to get several non-zero probabilities.

Best regards

Thomas

4 Likes

the number is too large,its exp overflow. I I think you should add batchnorm layer before convolution layer.

@tom Thank you for your reply! it really helps alot

@chenyuntc thanks! i’ve tried that and it works