How to use F.softmax

granth_jain · November 24, 2020, 10:26am

Hi,

I have a tensor and I want to calculate softmax along the rows of the tensor.

action_values = t.tensor([[-0.4001, -0.2948, 0.1288]])

as I understand cutting the tensor row-wise we need to specify dim as 1.
However I got an unexpected result.

can someone please help me in understanding how softmax and dim in softmax works.

Below is what I tried, but none gave me successful results.

F.softmax(action_values - max(action_values), dim = 0)
Out[15]: tensor([[1., 1., 1.]])

F.softmax(action_values - max(action_values), dim = 1)
Out[16]: tensor([[0.3333, 0.3333, 0.3333]])

F.softmax(action_values - max(action_values), dim = -1)
Out[17]: tensor([[0.3333, 0.3333, 0.3333]])

KFrank · November 24, 2020, 6:07pm

Hi Granth!

The short answer is that you are calling python’s max() function,
rather than pytorch’s torch.max() tensor function. This is causing
you to calculate softmax() for a tensor that is all zeros.

You have two issues:

First is the use of pytorch’s max(). max() doesn’t understand
tensors, and for reasons that have to do with the details of max()'s
implementation, this simply returns action_values again (with the
singleton dimension removed).

The second is that there is no need to subtract a scalar from your
tensor before calling softmax(). Any such scalar drops out anyway
in the softmax() calculation.

This script illustrates what is going on:

import torch
torch.__version__
action_values = torch.tensor([[-0.4001, -0.2948, 0.1288]])
action_values
max (action_values)         # this is python's max, not pytorch's
torch.max (action_values)   # pytorch's tensor-version of max
action_values - max (action_values)
action_values - torch.max (action_values)
tzeros = torch.zeros ((1, 3))
tzeros
torch.nn.functional.softmax (tzeros, dim = 0)
torch.nn.functional.softmax (tzeros, dim = 1)
torch.nn.functional.softmax (action_values, dim = 1)         # what you want
torch.nn.functional.softmax (action_values - 2.3, dim = 1)   # shift drops out

Here is its output:

>>> import torch
>>> torch.__version__
'1.6.0'
>>> action_values = torch.tensor([[-0.4001, -0.2948, 0.1288]])
>>> action_values
tensor([[-0.4001, -0.2948,  0.1288]])
>>> max (action_values)         # this is python's max, not pytorch's
tensor([-0.4001, -0.2948,  0.1288])
>>> torch.max (action_values)   # pytorch's tensor-version of max
tensor(0.1288)
>>> action_values - max (action_values)
tensor([[0., 0., 0.]])
>>> action_values - torch.max (action_values)
tensor([[-0.5289, -0.4236,  0.0000]])
>>> tzeros = torch.zeros ((1, 3))
>>> tzeros
tensor([[0., 0., 0.]])
>>> torch.nn.functional.softmax (tzeros, dim = 0)                               tensor([[1., 1., 1.]])
>>> torch.nn.functional.softmax (tzeros, dim = 1)                               tensor([[0.3333, 0.3333, 0.3333]])
>>> torch.nn.functional.softmax (action_values, dim = 1)         # what you want
tensor([[0.2626, 0.2918, 0.4456]])
>>> torch.nn.functional.softmax (action_values - 2.3, dim = 1)   # shift drops out
tensor([[0.2626, 0.2918, 0.4456]])

Best.

K. Frank