I’m trying to use a meta-learning approach to learn the learning rate of my optimization algorithm. An inner-loop optimization (SGD) learns the model parameters and an outer-loop optim learns the learning rate. A regular SGD has the form:
for group in self.param_groups: grad = torch.autograd.grad(loss, group['params'], create_graph=True) for idx, p in enumerate(group['params']): with torch.no_grad(): p.add_(grad[idx], alpha=-group['lr'])
Lets define parameter
lr as my learning rate. The way I understand, for the outer-loop optimization to compute the gradient of this parameter, I need to remove
torch.no_grad() from my inner-loop optimization, so that backprop sees the parameter
lr in my computational graph. However, doing so results in the following error:
RuntimeError: a leaf Variable that requires grad is being used in an
How should I approach this?