# How to do softmax for a bxcxmxn tensor channel whise

I want to apply softmax to each channel of a tensor and i was thinking the sum of elements for each channel should be one, but it is not like that.
this post shows how to do it for a tensor but in batch-wise manner.

can someone helps me what should i do to apply softmax on each channel and the sum in each channel be 1?

``````
import torch
import torch.nn.functional as F

A =Variable(torch.rand(1,2,3,3))
print(A)

print(F.softmax((A), dim=0).sum())

print(F.softmax((A), dim=1).sum())
``````

Variable containing:
(0 ,0 ,.,.) =
0.5912 0.3723 0.0399
0.6684 0.8080 0.6185
0.1265 0.2973 0.5427

(0 ,1 ,.,.) =
0.3595 0.4951 0.2176
0.0471 0.8907 0.7543
0.0262 0.8329 0.6792
[torch.FloatTensor of size 1x2x3x3]

Variable containing:
18
[torch.FloatTensor of size 1]

Variable containing:
9
[torch.FloatTensor of size 1]

Hi,

You need to sum accross the corresponding dimensions, not the whole tensor. So `.sum(0)` in the first case and `.sum(1)` in the second.

Hello,

but it still does not answer my question.
lets look at a simpler example here:

``````A =Variable(torch.rand(1,1,3,3))
print(A)

Variable containing:
(0 ,0 ,.,.) =
0.7806  0.2611  0.7685
0.3393  0.4488  0.0576
0.8112  0.4408  0.6531
[torch.FloatTensor of size 1x1x3x3]
``````

how can i apply softmax to tensor A in a way that the sum of values be 1?
if i do
`F.softmax((A), dim=1)`
or
`F.softmax((A), dim=0)`

it will gives me

``````(0 ,0 ,.,.) =
1  1  1
1  1  1
1  1  1
[torch.FloatTensor of size 1x1x3x3]
``````

please note that i used channel =1 for simplicity. my channels will be more than 1 and i want to appl softmax to each channel of the tensor in a way that the values of each channel be 1 after the softmax

If there is a single element in the dimension you softmax over, then the value will be 1 for it. This is what happens in this case.
If there are more than one elements, then the sum of them will be 1.

how should i do softmax to each channel of a tensor which is in form of `BxCxMxN` then?

Hi,

I think I don’t understand what you want.
Given a Tensor of size `BxCxMxN`, which of the following should return a Tensor full of ones?

``````out.sum(1) ?
out.sum(-1) ?
out.sum(-1).sum(-1) ?
``````

Im sorry for the confusion.
here is what i want:
consider tensor A:

``````A =Variable(torch.rand(1,2,3,3))
print(A)

Variable containing:
(0 ,0 ,.,.) =
0.5396  0.1361  0.7871
0.5187  0.1430  0.2143
0.5917  0.0184  0.5073

(0 ,1 ,.,.) =
0.8257  0.2010  0.5715
0.3362  0.3824  0.5582
0.8907  0.3006  0.5311
[torch.FloatTensor of size 1x2x3x3]
``````

I would like to apply softmax to each channel, therefor the results of each channel be like this:
channel 1:

``````
(torch.exp(A[0,0,:,:].view(-1))/torch.exp(A[0,0,:,:].view(-1)).sum()).view(3,3)

Variable containing:
0.1260  0.0841  0.1613
0.1234  0.0847  0.0910
0.1327  0.0748  0.1220
[torch.FloatTensor of size 3x3]

``````

so

``````
(torch.exp(A[0,0,:,:].view(-1))/torch.exp(A[0,0,:,:].view(-1)).sum()).view(3,3).sum()
Variable containing:
1
[torch.FloatTensor of size 1]
``````

and channel 2:

``````
(torch.exp(A[0,1,:,:].view(-1))/torch.exp(A[0,1,:,:].view(-1)).sum()).view(3,3)

Variable containing:
0.1485  0.0795  0.1152
0.0910  0.0953  0.1136
0.1585  0.0878  0.1106
[torch.FloatTensor of size 3x3]
``````

and therefore

``````
(torch.exp(A[0,1,:,:].view(-1))/torch.exp(A[0,1,:,:].view(-1)).sum()).view(3,3).sum()

Variable containing:
1
[torch.FloatTensor of size 1]
``````

Ho ok
So you want `out.sum(-1).sum(-1)` to be 1!

Then you need to collapse the last dims together before the sofmax:

``````# Variables don't exist anymore, you can remove
# them and just use Tensors everywhere
A = torch.rand(1, 2, 3, 3)

A_view = A.view(1, 2, -1)
out_view = F.softmax(A, dim=-1)
out = out_view.view(A.size())
``````
