You are describing expected behavior, as opt.zero_grad()
will set the .grad
attribute of the optimized model to None
.
See also the difference between step
, backward
, and zero_grad
.
If you want to change the behavior (i.e., set grads to zero), see the set_to_none
parameter of torch.optim.Optimizer
here.
If you want to check that gradients are properly applied, you can implement the following:
- retain a copy of the meta-model (e.g., deep-copy),
- step with an SGD optimizer (without any momentum, etc.),
- compare the difference between the updated meta-model and copied model, and the gradient of the local-model.
If the result of step 3 is torch.allclose
, then gradients were applied properly*.
*These results may differ slightly if you use an optimizer without momentum, etc., and a learning rate of 1.0
.