I was reading the docs (Top-Level Functions — higher 0.2.1 documentation) for the context manager ( higher.
innerloop_ctx
) and it said:
higher.
innerloop_ctx
( model , opt , device=None , copy_initial_weights=True , override=None , track_higher_grads=True )
…
copy_initial_weights – if true, the weights of the patched module are copied to form the initial weights of the patched module, and thus are not part of the gradient tape when unrolling the patched module. If this is set to False, the actual module weights will be the initial weights of the patched module. This is useful when doing MAML, for example.
but I don’t understand what that means. Perhaps because english is not my first language.
What is confusing to me is that I thought the whole point of this library is to be able to differentiate the unrolled optimizer. Thus, not being able to have the weights as part of the unrolled optimization seems strange to me. Why would anyone want to set this flag to true…ever? So I don’t understand what:
thus are not part of the gradient tape when unrolling the patched module
that means I guess.
I do understand what MAML is suppose to be.
Related: