Implement a meta-trainable step size

pinocchio · February 20, 2020, 1:16am

I want to implement a (meta) trainable step size. I tried it with this post:

and with the higher library (GitHub - facebookresearch/higher: higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps.) with no luck…

I tried:

eta = torch.tensor([0.5], requires_grad=True).view(1)
inner_opt = torch.optim.Adam(child_model.parameters(), lr=eta)
#meta_params = itertools.chain(child_model.parameters(),eta.parameters())
meta_params = itertools.chain(child_model.parameters())
meta_opt = torch.optim.Adam(meta_params, lr=1e-3)
# do meta-training/outer training minimize outerloop: min_{theta} sum_t L^val( theta^{T} - eta* Grad L^train(theta^{T}) ) 
nb_outer_steps = 10 # note, in this case it's the same as number of meta-train steps (but it's could not be the same depending how you loop through the val set)
for outer_i, (outer_inputs, outer_targets) in enumerate(testloader, 0):
    meta_opt.zero_grad()
    if outer_i >= nb_outer_steps:
        break
    # do inner-training/MAML; minimize innerloop: theta^{T} - eta* Grad L^train(theta^{T}) ~ argmin L^train(theta)
    nb_inner_steps = 3
    with higher.innerloop_ctx(child_model, inner_opt) as (fmodel, diffopt):

with error:

Exception has occurred: RuntimeError
Only Tensors created explicitly by the user (graph leaves) support the deepcopy protocol at the moment

which wouldn’t work anyway cuz eta might become negative suddenly so I really want to cap it with a sigmoid function but had to try something…