The computation graph is breaking in the outer loop of meta-learning, Meta gradients are None (FOMAML)

You are describing expected behavior, as opt.zero_grad() will set the .grad attribute of the optimized model to None.

See also the difference between step, backward, and zero_grad.

If you want to change the behavior (i.e., set grads to zero), see the set_to_none parameter of torch.optim.Optimizer here.

If you want to check that gradients are properly applied, you can implement the following:

  1. retain a copy of the meta-model (e.g., deep-copy),
  2. step with an SGD optimizer (without any momentum, etc.),
  3. compare the difference between the updated meta-model and copied model, and the gradient of the local-model.

If the result of step 3 is torch.allclose, then gradients were applied properly*.

*These results may differ slightly if you use an optimizer without momentum, etc., and a learning rate of 1.0.