Nn.softmax and nn.softmax2d do not give sum to 1

falmasri · April 19, 2018, 8:26am

I tested both

giving an input x of size (1,1,s,s)

nn.softmax(X[0,0]) 
nn.softmax2d(x)

the sum of the array or the feature map in the first is 100 while in the second is 1000000

the maximum value is 1 but it seems like it doesn’t apply softmax on all the array items ensemble

ptrblck · April 19, 2018, 9:09am

Which PyTorch version are you using?
You should get a warning in 0.3.1, that the implicit dimension choice for softmax has been deprecated.
In your first example, the softmax is calculated in dim=1, so that softmax(x[0, 0]).sum(1) will return ones.

The second example calculates the softmax in the channels, i.e. also dim=1.
Since you just have one channel, all values will be ones.
Change it to x = Variable(torch.randn(1, 3, 10, 10)), which could be an output of a segmentation model.
Now, the channels will sum to one.

falmasri · April 19, 2018, 9:14am

yes I’m having that warning and I thought of this answer but I wanted to show the different output of the two functions using my case.

In the second function it should work, why in case of 1 channel it gives ones everywhere ? I don’t have three channels, I only have this as output (P,1,s,s) where I want to perform softmax on the one channel I have as feature map. P is my batch-size and s is image dimension

ptrblck · April 19, 2018, 9:18am

You need different output channels to apply softmax on them.
For example, if you would like to output a segmentation, where each channel stores the probabilities for one class, you would have [batch_size, n_class, w, h] as the output dimension.
Now you could call softmax on it and each pixel will have the probability belonging to the class corresponding to the channel, i.e. output[0, 0, ...] will give you a probability map of class one for every pixel.

If you only have one output channel, it seems you have a binary classification task?
In this case you could use sigmoid functions.

falmasri · April 19, 2018, 9:21am

Actually, I’m not interested in channel normalization or class probability across channel. it is neither a question about classification. I’m doing post-processing on last feature map where I want to calculate the softmax of the entire matrix to have a normalized heatmap of the this channel.

the implementation in numpy is easy but I still would like to have it in pytorch running on GPU.

ptrblck · April 19, 2018, 11:05am

So you would like to calculate the softmax over all logits in the Tensor, such that the sum over all pixels returns one?
In this case, you could try the following:

x = Variable(torch.randn(1, 1, 10, 10))
softmax = nn.Softmax(dim=0)
y = softmax(x.view(-1)).view(1, 1, 10, 10)

If I misunderstood your question, could you post the numpy code so that I can have a look?

falmasri · April 19, 2018, 2:21pm

if I have only one input this way of factorizing the array works perfectly, but I might have (10,1,10,10).
by reversing the order again using .view(1,1,10,10), are all the values return to their origin place or I might have some displacement ?

ptrblck · April 19, 2018, 2:46pm

They will be returned to their original place.
Have a look at this example:

# Create input with 1s and 2s
x = Variable(torch.cat((torch.ones(1, 1, 10, 10), torch.ones(1, 1, 10, 10)*2), dim=0))
softmax = nn.Softmax(dim=0)
y = softmax(x.view(-1)).view(2, 1, 10, 10)
print(y)
y.sum()

Now, the whole Tensor is normalized, such that its sum is 1.

falmasri · April 19, 2018, 2:50pm

The sum of the whole tensor is 1. what i want is the sum should be for each P in the tensor (p,1,s,s).

a softmax that normalize a 2D tenor in its spatial values. not in their channels and not in their batch

ptrblck · April 19, 2018, 3:00pm

Yeah, I thought so, that’s why I mentioned it again.
I misunderstood the following statement:

You could use the following code:

x = Variable(torch.cat((torch.ones(1, 1, 10, 10), torch.ones(1, 1, 10, 10)*2), dim=0))
softmax = nn.Softmax(dim=1)
y = softmax(x.view(2, -1)).view(2, 1, 10, 10)

Now, each batch sample will have the sum of 1:

print(y[0, 0, ...].sum())
print(y[1, 0, ...].sum())

Does this work for you?

Happsky · October 17, 2018, 1:47am

@ptrblck Follow the example you gave, does the following right? Thank you in advance.
x=torch.ones(4,1,4,4)
b,c,h,w=x.size()
softmax=nn.Softmax(dim=1)
y=softmax(x.view(b,-1)).view(b,c,h,w)

ptrblck · October 17, 2018, 9:32am

This would work for the use case in this thread, i.e. normalizing each sample in the batch to visualize heatmaps.
It’s not a usual classification use case, if you are looking for it!

Happsky · October 17, 2018, 4:44pm

That’s what I need, I need to do spatial softmax, which means the sum of all pixel in one batch equals 1.