I’m trying to compute the derivative of the error with respect to the learning rate (see below). I set up the learning rate as a tensor with gradient tracking on and then use the learning rate to update the parameters. I’m trying to use autograd to take the derivative of the error wrt the learning rate, and my understanding is that setting create_graph = True allows higher-order derivatives to be taken. I get the RuntimeError “One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.” I want to use the learning rate in the gradient calculation, and setting allow_unused = True returns None. Is there a workaround for differentiating the error wrt hyperparameters like the learning rate, or am I setting something up wrong?
I know this may be slow or not the intended usage, but is there any way to accomplish a gradient wrt the learning rate using pytorch other than resorting to numeric derivatives? I thought maybe there would be a way to apply the changes to param through a copy of the model. But I don’t know how to get autograd to track it.
model = nn.Linear(2, 2, bias=True)
x = torch.rand(10,2)
y = x + 0.6
lr = nn.Parameter(torch.tensor(0.01, requires_grad=True))
func_loss = torch.nn.MSELoss()
err = func_loss(model(x), y)
grad = torch.autograd.grad(err, model.parameters(), create_graph=True)
for param, g in zip(model.parameters(), grad):
# same as before:
# RuntimeError: a leaf Variable that requires grad is being used in an in-place operation.
param = param - lr * g
err = func_loss(model(x), y)
print( torch.autograd.grad(err, lr, create_graph=True) )