Hi,
I’m trying to implement the algorithm described here.
A short description of the related part is as follows.
My attempt is as follows.
# 1. Forward-backward pass on training data
_, (inputs, labels) = next(enumerate(train_loader))
inputs, labels = inputs.to(device=args.device, non_blocking=True),\
labels.to(device=args.device, non_blocking=True)
meta_model.load_state_dict(model.state_dict())
y_hat_f = meta_model(inputs)
criterion.reduction = 'none'
l_f = criterion(y_hat_f, labels)
eps = torch.rand(l_f.size(), requires_grad=False, device=args.device).div(1e6)
eps.requires_grad = True
l_f = torch.sum(eps * l_f)
# 2. Compute grads wrt model and update its params
l_f.backward(retain_graph=True)
meta_optimizer.step()
# 3. Forward-backward pass on meta data with updated model
_, (inputs, labels) = next(enumerate(meta_loader))
inputs, labels = inputs.to(device=args.device, non_blocking=True),\
labels.to(device=args.device, non_blocking=True)
y_hat_g = model(inputs)
criterion.reduction = 'mean'
l_g = criterion(y_hat_g, labels)
# 4. Compute grads wrt eps and update weights
eps_grads = torch.autograd.grad(l_g, eps)
.....
At this line:
eps_grads = torch.autograd.grad(l_g, eps)
I get an error saying,
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.
If I set allow_used=True, it returns None to eps_grad.
As far as I can tell, autograd loses computation graph for some reason and doesn’t retain the information that eps was used in the computation of l_f, which in turn used in updating parameters of the model. So, l_g should be differentiable wrt eps but it doesn’t work here.
How can I solve this ?