Problem using softmax

pointW · March 31, 2017, 7:29am

here is the Variable before softmax

after softmax, the biggest one of them became 1 and others became 0

my forward function

def forward(self, x):
    x = F.relu(self.conv1(x))
    x = F.relu(self.conv2(x))
    x = F.relu(self.conv3(x))
    x = F.relu(self.conv4(x))
    x = F.relu(self.conv5(x))
    x = F.relu(self.conv6(x))
    x = F.relu(self.conv7(x))
    x = x.view(x.size(0), -1)
    x = F.softmax(x)
    return x

is there something wrong in my usage of softmax?

tom · March 31, 2017, 8:48am

This is to be expected. exp(-20) is about 2e-9, so if your largest input to softmax is 20 larger (in absolute numbers) than the others, the softmax will be extremely spiked. (And the softmax is saturated, i.e. you have vanishing gradients, see e.g. http://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/ for lots of details.)
The canonical way to overcome this is to use a temperature parameter to divide the inputs (see e.g. https://en.wikipedia.org/wiki/Softmax_function#Reinforcement_learning ), for your example, the maximal difference seems to be ~150, so you could try an initial temperature of ~200 to get several non-zero probabilities.

Best regards

Thomas

chenyuntc · March 31, 2017, 8:49am

the number is too large,its exp overflow. I I think you should add batchnorm layer before convolution layer.

pointW · April 1, 2017, 1:29am

@tom Thank you for your reply! it really helps alot

pointW · April 1, 2017, 1:30am

@chenyuntc thanks! i’ve tried that and it works