Why is softplus backward pass not equal to sigmoid?

Dementiy · October 20, 2019, 11:51pm

Hello, everyone. I can’t understand why is softplus backward pass not equal to sigmoid?

>>> x = torch.tensor(np.array([
    [-0.4578739 , -0.57322363, -0.85933977],
    [ 0.9095323 ,  0.78346789,  0.47258139],
    [-0.54425339, -0.92374175, -0.56345292]
]), requires_grad=True)
>>> out = torch.nn.Softplus()(x)
>>> out.backward(torch.DoubleTensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))
>>> x.grad
tensor([
    [0.3875, 0.3605, 0.2975],
    [0.7129, 0.6864, 0.6160],
    [0.3672, 0.2842, 0.3627]
], dtype=torch.float64)

# Expected output after backward
>>> torch.nn.Sigmoid()(out)
tensor([
    [0.6202, 0.6099, 0.5874],
    [0.7769, 0.7613, 0.7225],
    [0.6124, 0.5828, 0.6108]
], dtype=torch.float64, grad_fn=<SigmoidBackward>)

What am I missing?

KFrank · October 21, 2019, 2:27pm

Hi Dmitriy!

You are perfectly correct that the derivative of softplus
is sigmoid, and that calling backward() should give
you this derivative as the gradient with respect to x.

Your problem is that, for comparison, you are not calculating
sigmoid (x) (but, rather, sigmoid (out)).

Here is a (pytorch 0.3.0) version of your script that makes
the correct comparison:

import torch
print (torch.__version__)
import numpy as np

x = torch.autograd.Variable (torch.Tensor(np.array([
    [-0.4578739 , -0.57322363, -0.85933977],
    [ 0.9095323 ,  0.78346789,  0.47258139],
    [-0.54425339, -0.92374175, -0.56345292]
])), requires_grad = True)
out = torch.nn.Softplus()(x)
out.backward(torch.Tensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))

x.grad
torch.nn.Sigmoid()(x)
torch.nn.Sigmoid()(out)

And here is its output:

>>> import torch
>>> print (torch.__version__)
0.3.0b0+591e73e
>>> import numpy as np
>>>
>>> x = torch.autograd.Variable (torch.Tensor(np.array([
...     [-0.4578739 , -0.57322363, -0.85933977],
...     [ 0.9095323 ,  0.78346789,  0.47258139],
...     [-0.54425339, -0.92374175, -0.56345292]
... ])), requires_grad = True)
>>> out = torch.nn.Softplus()(x)
>>> out.backward(torch.Tensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))
>>>
>>> x.grad
Variable containing:
 0.3875  0.3605  0.2975
 0.7129  0.6864  0.6160
 0.3672  0.2842  0.3627
[torch.FloatTensor of size 3x3]

>>> torch.nn.Sigmoid()(x)
Variable containing:
 0.3875  0.3605  0.2975
 0.7129  0.6864  0.6160
 0.3672  0.2842  0.3627
[torch.FloatTensor of size 3x3]

>>> torch.nn.Sigmoid()(out)
Variable containing:
 0.6202  0.6099  0.5874
 0.7769  0.7613  0.7225
 0.6124  0.5828  0.6108
[torch.FloatTensor of size 3x3]

You can see that x.grad matches sigmoid (x).

Best.

K. Frank

Dementiy · October 21, 2019, 7:25pm

Yes, now I see my mistake. Thanks!