You are describing expected behavior, as opt.zero_grad() will set the .grad attribute of the optimized model to None.
See also the difference between step, backward, and zero_grad.
If you want to change the behavior (i.e., set grads to zero), see the set_to_none parameter of torch.optim.Optimizer here.
If you want to check that gradients are properly applied, you can implement the following:
- retain a copy of the meta-model (e.g., deep-copy),
- step with an SGD optimizer (without any momentum, etc.),
- compare the difference between the updated meta-model and copied model, and the gradient of the local-model.
If the result of step 3 is torch.allclose, then gradients were applied properly*.
*These results may differ slightly if you use an optimizer without momentum, etc., and a learning rate of 1.0.