In the code I try to do something like this:
lr = [nn.Parameter(torch.ones(1)*0.5) for _ in range(n)] #I have a list of learning rates parameters that I want to learn grad = autograd.grad(train_loss, model.parameters, create_graph=True) for param_indx, param in enumerate(model.parameters()): param.data.add_(-lr*grad[param_indx]) #manual SGD on parameters model_loss = loss(model(input), y) autograd.grad(model_loss, lr) # compute loss gradient with respect to the learning rate
It should theoretically be able to compute that gradient, doesn’t it?
But I get an error saying:
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.