Second order derivatives in meta-learning

dbp.pat94 · July 26, 2020, 1:46pm

I have a similar doubt. What I have found so far is that nn.Parameters are LEAF nodes with no history. So, the Gradient Descent (GD) operations that you are trying to perform for the inner loop (adapted_params[key] = val - meta_step_size * grad) won’t be recorded in the computation graph.

The following may help you understand why updating model parameters like the way you have won’t work in PyTorch in more detail:

In fact, you may want to use the package “HIGHER” or refer to the following codes for a workaround (look for MetaModule in the repositories):

I hope that helps you understand the problem with your method.