How to do softmax for a bxcxmxn tensor channel whise

isalirezag · January 22, 2019, 12:18am

I want to apply softmax to each channel of a tensor and i was thinking the sum of elements for each channel should be one, but it is not like that.
this post shows how to do it for a tensor but in batch-wise manner.

can someone helps me what should i do to apply softmax on each channel and the sum in each channel be 1?


import torch
from torch.autograd import Variable
import torch.nn.functional as F

A =Variable(torch.rand(1,2,3,3))
print(A)

print(F.softmax((A), dim=0).sum())

print(F.softmax((A), dim=1).sum())

Variable containing:
(0 ,0 ,.,.) =
0.5912 0.3723 0.0399
0.6684 0.8080 0.6185
0.1265 0.2973 0.5427

(0 ,1 ,.,.) =
0.3595 0.4951 0.2176
0.0471 0.8907 0.7543
0.0262 0.8329 0.6792
[torch.FloatTensor of size 1x2x3x3]

Variable containing:
18
[torch.FloatTensor of size 1]

Variable containing:
9
[torch.FloatTensor of size 1]

albanD · January 22, 2019, 9:42am

Hi,

You need to sum accross the corresponding dimensions, not the whole tensor. So .sum(0) in the first case and .sum(1) in the second.

isalirezag · January 22, 2019, 2:58pm

Hello,

thanks for your clarification.
but it still does not answer my question.
lets look at a simpler example here:

A =Variable(torch.rand(1,1,3,3))
print(A)

Variable containing:
(0 ,0 ,.,.) = 
  0.7806  0.2611  0.7685
  0.3393  0.4488  0.0576
  0.8112  0.4408  0.6531
[torch.FloatTensor of size 1x1x3x3]

how can i apply softmax to tensor A in a way that the sum of values be 1?
if i do
F.softmax((A), dim=1)
or
F.softmax((A), dim=0)

it will gives me

(0 ,0 ,.,.) = 
  1  1  1
  1  1  1
  1  1  1
[torch.FloatTensor of size 1x1x3x3]

please note that i used channel =1 for simplicity. my channels will be more than 1 and i want to appl softmax to each channel of the tensor in a way that the values of each channel be 1 after the softmax

albanD · January 22, 2019, 3:15pm

If there is a single element in the dimension you softmax over, then the value will be 1 for it. This is what happens in this case.
If there are more than one elements, then the sum of them will be 1.

isalirezag · January 22, 2019, 3:48pm

how should i do softmax to each channel of a tensor which is in form of BxCxMxN then?

albanD · January 22, 2019, 4:50pm

Hi,

I think I don’t understand what you want.
Given a Tensor of size BxCxMxN, which of the following should return a Tensor full of ones?

out.sum(1) ?
out.sum(-1) ?
out.sum(-1).sum(-1) ?

isalirezag · January 22, 2019, 5:30pm

Im sorry for the confusion.
here is what i want:
consider tensor A:

A =Variable(torch.rand(1,2,3,3))
print(A)

Variable containing:
(0 ,0 ,.,.) = 
  0.5396  0.1361  0.7871
  0.5187  0.1430  0.2143
  0.5917  0.0184  0.5073

(0 ,1 ,.,.) = 
  0.8257  0.2010  0.5715
  0.3362  0.3824  0.5582
  0.8907  0.3006  0.5311
[torch.FloatTensor of size 1x2x3x3]

I would like to apply softmax to each channel, therefor the results of each channel be like this:
channel 1:


(torch.exp(A[0,0,:,:].view(-1))/torch.exp(A[0,0,:,:].view(-1)).sum()).view(3,3)


Variable containing:
 0.1260  0.0841  0.1613
 0.1234  0.0847  0.0910
 0.1327  0.0748  0.1220
[torch.FloatTensor of size 3x3]

so


(torch.exp(A[0,0,:,:].view(-1))/torch.exp(A[0,0,:,:].view(-1)).sum()).view(3,3).sum()
Variable containing:
 1
[torch.FloatTensor of size 1]

and channel 2:


 (torch.exp(A[0,1,:,:].view(-1))/torch.exp(A[0,1,:,:].view(-1)).sum()).view(3,3)


Variable containing:
 0.1485  0.0795  0.1152
 0.0910  0.0953  0.1136
 0.1585  0.0878  0.1106
[torch.FloatTensor of size 3x3]

and therefore


(torch.exp(A[0,1,:,:].view(-1))/torch.exp(A[0,1,:,:].view(-1)).sum()).view(3,3).sum()


Variable containing:
 1
[torch.FloatTensor of size 1]

albanD · January 22, 2019, 6:14pm

Ho ok
So you want out.sum(-1).sum(-1) to be 1!

Then you need to collapse the last dims together before the sofmax:

# Variables don't exist anymore, you can remove
# them and just use Tensors everywhere 
A = torch.rand(1, 2, 3, 3)

A_view = A.view(1, 2, -1)
out_view = F.softmax(A, dim=-1)
out = out_view.view(A.size())