Hi Dmitriy!

You are perfectly correct that the derivative of `softplus`

is `sigmoid`

, and that calling `backward()`

should give

you this derivative as the gradient with respect to `x`

.

Your problem is that, for comparison, you are not calculating

`sigmoid (x)`

(but, rather, `sigmoid (out)`

).

Here is a (pytorch 0.3.0) version of your script that makes

the correct comparison:

```
import torch
print (torch.__version__)
import numpy as np
x = torch.autograd.Variable (torch.Tensor(np.array([
[-0.4578739 , -0.57322363, -0.85933977],
[ 0.9095323 , 0.78346789, 0.47258139],
[-0.54425339, -0.92374175, -0.56345292]
])), requires_grad = True)
out = torch.nn.Softplus()(x)
out.backward(torch.Tensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))
x.grad
torch.nn.Sigmoid()(x)
torch.nn.Sigmoid()(out)
```

And here is its output:

```
>>> import torch
>>> print (torch.__version__)
0.3.0b0+591e73e
>>> import numpy as np
>>>
>>> x = torch.autograd.Variable (torch.Tensor(np.array([
... [-0.4578739 , -0.57322363, -0.85933977],
... [ 0.9095323 , 0.78346789, 0.47258139],
... [-0.54425339, -0.92374175, -0.56345292]
... ])), requires_grad = True)
>>> out = torch.nn.Softplus()(x)
>>> out.backward(torch.Tensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))
>>>
>>> x.grad
Variable containing:
0.3875 0.3605 0.2975
0.7129 0.6864 0.6160
0.3672 0.2842 0.3627
[torch.FloatTensor of size 3x3]
>>> torch.nn.Sigmoid()(x)
Variable containing:
0.3875 0.3605 0.2975
0.7129 0.6864 0.6160
0.3672 0.2842 0.3627
[torch.FloatTensor of size 3x3]
>>> torch.nn.Sigmoid()(out)
Variable containing:
0.6202 0.6099 0.5874
0.7769 0.7613 0.7225
0.6124 0.5828 0.6108
[torch.FloatTensor of size 3x3]
```

You can see that `x.grad`

matches `sigmoid (x)`

.

Best.

K. Frank