Which PyTorch version are you using?
You should get a warning in 0.3.1, that the implicit dimension choice for softmax has been deprecated.
In your first example, the softmax is calculated in dim=1, so that softmax(x[0, 0]).sum(1) will return ones.

The second example calculates the softmax in the channels, i.e. also dim=1.
Since you just have one channel, all values will be ones.
Change it to x = Variable(torch.randn(1, 3, 10, 10)), which could be an output of a segmentation model.
Now, the channels will sum to one.

yes I’m having that warning and I thought of this answer but I wanted to show the different output of the two functions using my case.

In the second function it should work, why in case of 1 channel it gives ones everywhere ? I don’t have three channels, I only have this as output (P,1,s,s) where I want to perform softmax on the one channel I have as feature map. P is my batch-size and s is image dimension

You need different output channels to apply softmax on them.
For example, if you would like to output a segmentation, where each channel stores the probabilities for one class, you would have [batch_size, n_class, w, h] as the output dimension.
Now you could call softmax on it and each pixel will have the probability belonging to the class corresponding to the channel, i.e. output[0, 0, ...] will give you a probability map of class one for every pixel.

If you only have one output channel, it seems you have a binary classification task?
In this case you could use sigmoid functions.

Actually, I’m not interested in channel normalization or class probability across channel. it is neither a question about classification. I’m doing post-processing on last feature map where I want to calculate the softmax of the entire matrix to have a normalized heatmap of the this channel.

the implementation in numpy is easy but I still would like to have it in pytorch running on GPU.

So you would like to calculate the softmax over all logits in the Tensor, such that the sum over all pixels returns one?
In this case, you could try the following:

x = Variable(torch.randn(1, 1, 10, 10))
softmax = nn.Softmax(dim=0)
y = softmax(x.view(-1)).view(1, 1, 10, 10)

If I misunderstood your question, could you post the numpy code so that I can have a look?

if I have only one input this way of factorizing the array works perfectly, but I might have (10,1,10,10).
by reversing the order again using .view(1,1,10,10), are all the values return to their origin place or I might have some displacement ?

@ptrblck Follow the example you gave, does the following right? Thank you in advance.
x=torch.ones(4,1,4,4)
b,c,h,w=x.size()
softmax=nn.Softmax(dim=1)
y=softmax(x.view(b,-1)).view(b,c,h,w)

This would work for the use case in this thread, i.e. normalizing each sample in the batch to visualize heatmaps.
It’s not a usual classification use case, if you are looking for it!