Grad is None in net.state.dict(), but not in net.named_parameters(), why?

Why did I get None gradients from net.state.dict() as below?
Should not the gradient values from net.state.dict() be the same with those from net.named_parameters()?
Note: using this state_dict() and this named_parameters()

net.zero_grad()
ypred = net(x)
loss = loss_fn(ypred, y)
loss.backward()

print('### net.named_parameters():')
for name, param in net.named_parameters():
   print('param.grad=', param.grad)

print('### net.state_dict():')
for key, val in net.state_dict().items():
    print('val.grad=', val.grad)

Output:

param.grad= tensor(1.00000e-02 *
       [[ 0.1781,  0.1962],
        [-1.3298, -1.9067],
        [-1.8645, -1.9591],
        [ 1.4285,  1.6071],
        [-1.3251, -1.3051]])
param.grad= tensor(1.00000e-02 *
       [ 0.3545, -3.0727, -3.7230,  2.9359, -2.5023])
param.grad= tensor([[-0.4717, -0.4988, -0.3780, -0.2974, -0.2383]])
param.grad= tensor([-0.7368])
### net.state_dict():
val.grad= None
val.grad= None
val.grad= None
val.grad= None

Thank you.

Try to pass keep_vars=True to net.state_dict(keep_vars=True) and it should work.
Since the default is set to False, the underlying data of the tensors will be returned and thus detached from the variable.

2 Likes

Thanks @ptrblck:
Btw, the doc should be updated for this info (plus the fact that state_dict() returns an OrderedDict) :slight_smile: