Learning the learning rate via meta-learning

I’m trying to use a meta-learning approach to learn the learning rate of my optimization algorithm. An inner-loop optimization (SGD) learns the model parameters and an outer-loop optim learns the learning rate. A regular SGD has the form:

for group in self.param_groups:
    grad = torch.autograd.grad(loss, group['params'], create_graph=True)
    for idx, p in enumerate(group['params']):
        with torch.no_grad():
            p.add_(grad[idx], alpha=-group['lr'])

Lets define parameter lr as my learning rate. The way I understand, for the outer-loop optimization to compute the gradient of this parameter, I need to remove torch.no_grad() from my inner-loop optimization, so that backprop sees the parameter lr in my computational graph. However, doing so results in the following error:

RuntimeError: a leaf Variable that requires grad is being used in an
in-place operation.

How should I approach this?