# LogSoftmax vs Softmax

Hi Daniel!

I think you are correct. I would call this round-off error (where,
numerically, `(1.0 + delta) - 1.0` becomes exactly floating-point
zero somewhere around `delta = 1.e-16` (for double precision)).

To me, underflow is where a very small `epsilon` becomes exactly
floating-point zero somewhere around `epsilon = 1.e-324` (for
double precision).

The problem is that for small `delta`, `exp (delta) ~ 1.0 + delta`,
so you get exactly this kind of round-off error.

Note that many math libraries, including pytorch, implement the
expm1() function to address this issue.

(I don’t think this helps with `Softmax` or `LogSoftmax` though, because
in this case you anyway end up with results of order 1.)

This (0.3.0) script illustrates the round-off error issue and the `expm1()`
function:

``````import torch
torch.__version__

import math

def expm1 (t):   # not yet implemented in 0.3.0
res  = torch.zeros_like (t)
for  i in range (t.shape):
res[i] = math.expm1 (t[i])   # double precision, then truncated, if FloatTensor
return res

z = torch.DoubleTensor ([1.e-15, 2.e-15, 3.e-15])

z_max = torch.max (z)

torch.set_printoptions (precision = 20)

expm1 (z)                             # correct to about 15 decimal digits
expm1 (z - z_max)                     # correct to about 15 decimal digits

expm1 (z.float())                     # not exactly single precision
expm1 (z.float() - z_max)             # not exactly single precision

torch.exp (z) - 1.0                   # double precision (without expm1)
torch.exp (z - z_max) - 1.0           # double precision (without expm1)

torch.exp (z.float()) - 1.0           # single precision (without expm1)
torch.exp (z.float() - z_max) - 1.0   # single precision (without expm1)
``````

Here is the output:

``````>>> import torch
>>> torch.__version__
'0.3.0b0+591e73e'
>>>
>>> import math
>>>
>>> def expm1 (t):   # not yet implemented in 0.3.0
...     res  = torch.zeros_like (t)
...     for  i in range (t.shape):
...         res[i] = math.expm1 (t[i])   # double precision, then truncated, if FloatTensor
...     return res
...
>>> z = torch.DoubleTensor ([1.e-15, 2.e-15, 3.e-15])
>>>
>>> z_max = torch.max (z)
>>>
>>> torch.set_printoptions (precision = 20)
>>>
>>> expm1 (z)                             # correct to about 15 decimal digits

1.00000e-15 *
1.00000000000000066613
2.00000000000000177636
3.00000000000000444089
[torch.DoubleTensor of size 3]

>>> expm1 (z - z_max)                     # correct to about 15 decimal digits

1.00000e-15 *
-1.99999999999999755751
-0.99999999999999900080
0.00000000000000000000
[torch.DoubleTensor of size 3]

>>>
>>> expm1 (z.float())                     # not exactly single precision

1.00000e-15 *
1.00000000362749363880
2.00000000725498727761
2.99999990500336233268
[torch.FloatTensor of size 3]

>>> expm1 (z.float() - z_max)             # not exactly single precision

1.00000e-15 *
-1.99999979549675055424
-0.99999989774837527712
0.00000000000000000000
[torch.FloatTensor of size 3]

>>>
>>> torch.exp (z) - 1.0                   # double precision (without expm1)

1.00000e-15 *
1.11022302462515654042
1.99840144432528155072
3.10862446895043786910
[torch.DoubleTensor of size 3]

>>> torch.exp (z - z_max) - 1.0           # double precision (without expm1)

1.00000e-15 *
-1.99840144432528155072
-0.99920072216264077536
0.00000000000000000000
[torch.DoubleTensor of size 3]
``````

Best.

K. Frank

Why then in PyTorch documentation such example:

``````>>> # Example of target with class indices
>>> loss = nn.CrossEntropyLoss()
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.empty(3, dtype=torch.long).random_(5)
>>> output = loss(input, target)
>>> output.backward()
>>>
>>> # Example of target with class probabilities
>>> input = torch.randn(3, 5, requires_grad=True)
>>> target = torch.randn(3, 5).softmax(dim=1)
>>> output = loss(input, target)
>>> output.backward()
``````

It is confusing a lot !!
Also see https://pytorch.org/docs/stable/generated/torch.nn.functional.softmax.html:

This function doesn’t work directly with NLLLoss, which expects the Log to be computed between the Softmax and itself. Use log_softmax instead (it’s faster and has better numerical properties).

Why in CrossEntropyLoss example as alternative shown softmax ??

CrossEntropyLoss = LogSoftmax(x) + NLLoss(x)
LogSoftmax = Log(Softmax(x))
Softmax(x) != LogSoftmax(x)

Why such inconsistency in documentation ?
Examples for CrossEntropyLoss should be fixed !!

Hi Denis!

At issue is that some new functionality has been added to pytorch’s
`CrossEntropyLoss` as of pytorch version 1.10.

Compare the documentation for `CrossEntropyLoss` in versions 1.9 and 1.10.

As of version 1.10, `CrossEntropyLoss` accepts probabilistic `target`s (that
are floating-point numbers), in addition to integer-class-label `target`s.

The sole purpose of `softmax()` in this example is to generate a `target`
that is a legitimate probability distribution across the class dimension. Note
that `softmax()` is not being applied to `input`.

Best.

K. Frank

1 Like