Hi,

I’m trying to implement the algorithm described here.

A short description of the related part is as follows.

My attempt is as follows.

```
# 1. Forward-backward pass on training data
_, (inputs, labels) = next(enumerate(train_loader))
inputs, labels = inputs.to(device=args.device, non_blocking=True),\
labels.to(device=args.device, non_blocking=True)
meta_model.load_state_dict(model.state_dict())
y_hat_f = meta_model(inputs)
criterion.reduction = 'none'
l_f = criterion(y_hat_f, labels)
eps = torch.rand(l_f.size(), requires_grad=False, device=args.device).div(1e6)
eps.requires_grad = True
l_f = torch.sum(eps * l_f)
# 2. Compute grads wrt model and update its params
l_f.backward(retain_graph=True)
meta_optimizer.step()
# 3. Forward-backward pass on meta data with updated model
_, (inputs, labels) = next(enumerate(meta_loader))
inputs, labels = inputs.to(device=args.device, non_blocking=True),\
labels.to(device=args.device, non_blocking=True)
y_hat_g = model(inputs)
criterion.reduction = 'mean'
l_g = criterion(y_hat_g, labels)
# 4. Compute grads wrt eps and update weights
eps_grads = torch.autograd.grad(l_g, eps)
.....
```

At this line:

eps_grads = torch.autograd.grad(l_g, eps)

I get an error saying,

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

If I set **allow_used=True**, it returns * None* to

**eps_grad.**

As far as I can tell, autograd loses computation graph for some reason and doesn’t retain the information that **eps** was used in the computation of **l_f**, which in turn used in updating parameters of the model. So, **l_g** should be differentiable wrt **eps** but it doesn’t work here.

How can I solve this ?