I’m trying to use a meta-learning approach to learn the learning rate of my optimization algorithm. An inner-loop optimization (SGD) learns the model parameters and an outer-loop optim learns the learning rate. A regular SGD has the form:

```
for group in self.param_groups:
grad = torch.autograd.grad(loss, group['params'], create_graph=True)
for idx, p in enumerate(group['params']):
with torch.no_grad():
p.add_(grad[idx], alpha=-group['lr'])
```

Lets define parameter `lr`

as my learning rate. The way I understand, for the outer-loop optimization to compute the gradient of this parameter, I need to remove `torch.no_grad()`

from my inner-loop optimization, so that backprop sees the parameter `lr`

in my computational graph. However, doing so results in the following error:

RuntimeError: a leaf Variable that requires grad is being used in an

in-place operation.

How should I approach this?