Nn.softmax and nn.softmax2d do not give sum to 1

I tested both

giving an input x of size (1,1,s,s)

nn.softmax(X[0,0]) 
nn.softmax2d(x)

the sum of the array or the feature map in the first is 100 while in the second is 1000000

the maximum value is 1 but it seems like it doesn’t apply softmax on all the array items ensemble

Which PyTorch version are you using?
You should get a warning in 0.3.1, that the implicit dimension choice for softmax has been deprecated.
In your first example, the softmax is calculated in dim=1, so that softmax(x[0, 0]).sum(1) will return ones.

The second example calculates the softmax in the channels, i.e. also dim=1.
Since you just have one channel, all values will be ones.
Change it to x = Variable(torch.randn(1, 3, 10, 10)), which could be an output of a segmentation model.
Now, the channels will sum to one.

yes I’m having that warning and I thought of this answer but I wanted to show the different output of the two functions using my case.

In the second function it should work, why in case of 1 channel it gives ones everywhere ? I don’t have three channels, I only have this as output (P,1,s,s) where I want to perform softmax on the one channel I have as feature map. P is my batch-size and s is image dimension

You need different output channels to apply softmax on them.
For example, if you would like to output a segmentation, where each channel stores the probabilities for one class, you would have [batch_size, n_class, w, h] as the output dimension.
Now you could call softmax on it and each pixel will have the probability belonging to the class corresponding to the channel, i.e. output[0, 0, ...] will give you a probability map of class one for every pixel.

If you only have one output channel, it seems you have a binary classification task?
In this case you could use sigmoid functions.

Actually, I’m not interested in channel normalization or class probability across channel. it is neither a question about classification. I’m doing post-processing on last feature map where I want to calculate the softmax of the entire matrix to have a normalized heatmap of the this channel.

the implementation in numpy is easy but I still would like to have it in pytorch running on GPU.

So you would like to calculate the softmax over all logits in the Tensor, such that the sum over all pixels returns one?
In this case, you could try the following:

x = Variable(torch.randn(1, 1, 10, 10))
softmax = nn.Softmax(dim=0)
y = softmax(x.view(-1)).view(1, 1, 10, 10)

If I misunderstood your question, could you post the numpy code so that I can have a look?

if I have only one input this way of factorizing the array works perfectly, but I might have (10,1,10,10).
by reversing the order again using .view(1,1,10,10), are all the values return to their origin place or I might have some displacement ?

They will be returned to their original place.
Have a look at this example:

# Create input with 1s and 2s
x = Variable(torch.cat((torch.ones(1, 1, 10, 10), torch.ones(1, 1, 10, 10)*2), dim=0))
softmax = nn.Softmax(dim=0)
y = softmax(x.view(-1)).view(2, 1, 10, 10)
print(y)
y.sum()

Now, the whole Tensor is normalized, such that its sum is 1.

The sum of the whole tensor is 1. what i want is the sum should be for each P in the tensor (p,1,s,s).

a softmax that normalize a 2D tenor in its spatial values. not in their channels and not in their batch

Yeah, I thought so, that’s why I mentioned it again.
I misunderstood the following statement:

You could use the following code:

x = Variable(torch.cat((torch.ones(1, 1, 10, 10), torch.ones(1, 1, 10, 10)*2), dim=0))
softmax = nn.Softmax(dim=1)
y = softmax(x.view(2, -1)).view(2, 1, 10, 10)

Now, each batch sample will have the sum of 1:

print(y[0, 0, ...].sum())
print(y[1, 0, ...].sum())

Does this work for you?

2 Likes

@ptrblck Follow the example you gave, does the following right? Thank you in advance.
x=torch.ones(4,1,4,4)
b,c,h,w=x.size()
softmax=nn.Softmax(dim=1)
y=softmax(x.view(b,-1)).view(b,c,h,w)

1 Like

This would work for the use case in this thread, i.e. normalizing each sample in the batch to visualize heatmaps.
It’s not a usual classification use case, if you are looking for it!

That’s what I need, I need to do spatial softmax, which means the sum of all pixel in one batch equals 1.