Hi Dmitriy!
You are perfectly correct that the derivative of softplus
is sigmoid
, and that calling backward()
should give
you this derivative as the gradient with respect to x
.
Your problem is that, for comparison, you are not calculating
sigmoid (x)
(but, rather, sigmoid (out)
).
Here is a (pytorch 0.3.0) version of your script that makes
the correct comparison:
import torch
print (torch.__version__)
import numpy as np
x = torch.autograd.Variable (torch.Tensor(np.array([
[-0.4578739 , -0.57322363, -0.85933977],
[ 0.9095323 , 0.78346789, 0.47258139],
[-0.54425339, -0.92374175, -0.56345292]
])), requires_grad = True)
out = torch.nn.Softplus()(x)
out.backward(torch.Tensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))
x.grad
torch.nn.Sigmoid()(x)
torch.nn.Sigmoid()(out)
And here is its output:
>>> import torch
>>> print (torch.__version__)
0.3.0b0+591e73e
>>> import numpy as np
>>>
>>> x = torch.autograd.Variable (torch.Tensor(np.array([
... [-0.4578739 , -0.57322363, -0.85933977],
... [ 0.9095323 , 0.78346789, 0.47258139],
... [-0.54425339, -0.92374175, -0.56345292]
... ])), requires_grad = True)
>>> out = torch.nn.Softplus()(x)
>>> out.backward(torch.Tensor([[1., 1., 1.], [1., 1., 1.], [1.,1.,1.]]))
>>>
>>> x.grad
Variable containing:
0.3875 0.3605 0.2975
0.7129 0.6864 0.6160
0.3672 0.2842 0.3627
[torch.FloatTensor of size 3x3]
>>> torch.nn.Sigmoid()(x)
Variable containing:
0.3875 0.3605 0.2975
0.7129 0.6864 0.6160
0.3672 0.2842 0.3627
[torch.FloatTensor of size 3x3]
>>> torch.nn.Sigmoid()(out)
Variable containing:
0.6202 0.6099 0.5874
0.7769 0.7613 0.7225
0.6124 0.5828 0.6108
[torch.FloatTensor of size 3x3]
You can see that x.grad
matches sigmoid (x)
.
Best.
K. Frank