Why is softplus backward pass not equal to sigmoid?

Hello, everyone. I can’t understand why is softplus backward pass not equal to sigmoid?

>>> x = torch.tensor(np.array([
    [-0.4578739 , -0.57322363, -0.85933977],
    [ 0.9095323 ,  0.78346789,  0.47258139],
    [-0.54425339, -0.92374175, -0.56345292]
]), requires_grad=True)
>>> out = torch.nn.Softplus()(x)
>>> out.backward(torch.DoubleTensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))
>>> x.grad
tensor([
    [0.3875, 0.3605, 0.2975],
    [0.7129, 0.6864, 0.6160],
    [0.3672, 0.2842, 0.3627]
], dtype=torch.float64)

# Expected output after backward
>>> torch.nn.Sigmoid()(out)
tensor([
    [0.6202, 0.6099, 0.5874],
    [0.7769, 0.7613, 0.7225],
    [0.6124, 0.5828, 0.6108]
], dtype=torch.float64, grad_fn=<SigmoidBackward>)

What am I missing?

Hi Dmitriy!

You are perfectly correct that the derivative of softplus
is sigmoid, and that calling backward() should give
you this derivative as the gradient with respect to x.

Your problem is that, for comparison, you are not calculating
sigmoid (x) (but, rather, sigmoid (out)).

Here is a (pytorch 0.3.0) version of your script that makes
the correct comparison:

import torch
print (torch.__version__)
import numpy as np

x = torch.autograd.Variable (torch.Tensor(np.array([
    [-0.4578739 , -0.57322363, -0.85933977],
    [ 0.9095323 ,  0.78346789,  0.47258139],
    [-0.54425339, -0.92374175, -0.56345292]
])), requires_grad = True)
out = torch.nn.Softplus()(x)
out.backward(torch.Tensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))

x.grad
torch.nn.Sigmoid()(x)
torch.nn.Sigmoid()(out)

And here is its output:

>>> import torch
>>> print (torch.__version__)
0.3.0b0+591e73e
>>> import numpy as np
>>>
>>> x = torch.autograd.Variable (torch.Tensor(np.array([
...     [-0.4578739 , -0.57322363, -0.85933977],
...     [ 0.9095323 ,  0.78346789,  0.47258139],
...     [-0.54425339, -0.92374175, -0.56345292]
... ])), requires_grad = True)
>>> out = torch.nn.Softplus()(x)
>>> out.backward(torch.Tensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))
>>>
>>> x.grad
Variable containing:
 0.3875  0.3605  0.2975
 0.7129  0.6864  0.6160
 0.3672  0.2842  0.3627
[torch.FloatTensor of size 3x3]

>>> torch.nn.Sigmoid()(x)
Variable containing:
 0.3875  0.3605  0.2975
 0.7129  0.6864  0.6160
 0.3672  0.2842  0.3627
[torch.FloatTensor of size 3x3]

>>> torch.nn.Sigmoid()(out)
Variable containing:
 0.6202  0.6099  0.5874
 0.7769  0.7613  0.7225
 0.6124  0.5828  0.6108
[torch.FloatTensor of size 3x3]

You can see that x.grad matches sigmoid (x).

Best.

K. Frank

Yes, now I see my mistake. Thanks!