How to do softmax for a bxcxmxn tensor channel whise

I want to apply softmax to each channel of a tensor and i was thinking the sum of elements for each channel should be one, but it is not like that.
this post shows how to do it for a tensor but in batch-wise manner.

can someone helps me what should i do to apply softmax on each channel and the sum in each channel be 1?


import torch
from torch.autograd import Variable
import torch.nn.functional as F

A =Variable(torch.rand(1,2,3,3))
print(A)

print(F.softmax((A), dim=0).sum())

print(F.softmax((A), dim=1).sum())

Variable containing:
(0 ,0 ,.,.) =
0.5912 0.3723 0.0399
0.6684 0.8080 0.6185
0.1265 0.2973 0.5427

(0 ,1 ,.,.) =
0.3595 0.4951 0.2176
0.0471 0.8907 0.7543
0.0262 0.8329 0.6792
[torch.FloatTensor of size 1x2x3x3]

Variable containing:
18
[torch.FloatTensor of size 1]

Variable containing:
9
[torch.FloatTensor of size 1]

Hi,

You need to sum accross the corresponding dimensions, not the whole tensor. So .sum(0) in the first case and .sum(1) in the second.

Hello,

thanks for your clarification.
but it still does not answer my question.
lets look at a simpler example here:

A =Variable(torch.rand(1,1,3,3))
print(A)

Variable containing:
(0 ,0 ,.,.) = 
  0.7806  0.2611  0.7685
  0.3393  0.4488  0.0576
  0.8112  0.4408  0.6531
[torch.FloatTensor of size 1x1x3x3]

how can i apply softmax to tensor A in a way that the sum of values be 1?
if i do
F.softmax((A), dim=1)
or
F.softmax((A), dim=0)

it will gives me

(0 ,0 ,.,.) = 
  1  1  1
  1  1  1
  1  1  1
[torch.FloatTensor of size 1x1x3x3]

please note that i used channel =1 for simplicity. my channels will be more than 1 and i want to appl softmax to each channel of the tensor in a way that the values of each channel be 1 after the softmax

If there is a single element in the dimension you softmax over, then the value will be 1 for it. This is what happens in this case.
If there are more than one elements, then the sum of them will be 1.

how should i do softmax to each channel of a tensor which is in form of BxCxMxN then?

Hi,

I think I don’t understand what you want.
Given a Tensor of size BxCxMxN, which of the following should return a Tensor full of ones?

out.sum(1) ?
out.sum(-1) ?
out.sum(-1).sum(-1) ?

Im sorry for the confusion.
here is what i want:
consider tensor A:

A =Variable(torch.rand(1,2,3,3))
print(A)

Variable containing:
(0 ,0 ,.,.) = 
  0.5396  0.1361  0.7871
  0.5187  0.1430  0.2143
  0.5917  0.0184  0.5073

(0 ,1 ,.,.) = 
  0.8257  0.2010  0.5715
  0.3362  0.3824  0.5582
  0.8907  0.3006  0.5311
[torch.FloatTensor of size 1x2x3x3]

I would like to apply softmax to each channel, therefor the results of each channel be like this:
channel 1:


(torch.exp(A[0,0,:,:].view(-1))/torch.exp(A[0,0,:,:].view(-1)).sum()).view(3,3)


Variable containing:
 0.1260  0.0841  0.1613
 0.1234  0.0847  0.0910
 0.1327  0.0748  0.1220
[torch.FloatTensor of size 3x3]

so


(torch.exp(A[0,0,:,:].view(-1))/torch.exp(A[0,0,:,:].view(-1)).sum()).view(3,3).sum()
Variable containing:
 1
[torch.FloatTensor of size 1]

and channel 2:


 (torch.exp(A[0,1,:,:].view(-1))/torch.exp(A[0,1,:,:].view(-1)).sum()).view(3,3)


Variable containing:
 0.1485  0.0795  0.1152
 0.0910  0.0953  0.1136
 0.1585  0.0878  0.1106
[torch.FloatTensor of size 3x3]

and therefore


(torch.exp(A[0,1,:,:].view(-1))/torch.exp(A[0,1,:,:].view(-1)).sum()).view(3,3).sum()


Variable containing:
 1
[torch.FloatTensor of size 1]

Ho ok
So you want out.sum(-1).sum(-1) to be 1!

Then you need to collapse the last dims together before the sofmax:

# Variables don't exist anymore, you can remove
# them and just use Tensors everywhere 
A = torch.rand(1, 2, 3, 3)

A_view = A.view(1, 2, -1)
out_view = F.softmax(A, dim=-1)
out = out_view.view(A.size())
1 Like