Hi everyone,
I am trying to implement MetaSGD, where learning rates are learnable. I am having trouble getting gradients for them.
The code:
for j in range(iterations):
self.optim.zero_grad()
inner_output = self.model(inner_x)
inner_loss = self.criterion(inner_output, inner_y)
grads = torch.autograd.grad(inner_loss, self.model.parameters(), create_graph=True)
for i, param in enumerate(self.model.parameters()):
param.grad = self.lrs[i] * grads[i]
self.optim.step()
outer_output = self.model(outer_x)
outer_loss = self.criterion(outer_output, outer_y)
# These lines should create the grads for the learning rates and update them accordingly.
outer_loss.backward()
# lrs.grad = None
other_optimizer.step()
What happens is that lrs.grad = None
. Meaning that param.grad = self.lrs[i] * grads[i]
probably doesnt add to the computation graph for calculating the gradients.
And if I change the code to:
for j in range(iterations):
self.optim.zero_grad()
inner_output = self.model(inner_x)
inner_loss = self.criterion(inner_output, inner_y)
grads = torch.autograd.grad(inner_loss, self.model.parameters(), create_graph=True)
for i, param in enumerate(self.model.parameters()):
param -= self.lrs[i] * grads[i]
outer_output = self.model(outer_x)
outer_loss = self.criterion(outer_output, outer_y)
# These lines should create the grads for the learning rates and update them accordingly.
outer_loss.backward()
other_optimizer.step()
Then param -= self.lrs[i] * grads[i]
gives a RunTimeError
because of an in place operation. Which are not allowed for variables that requires grad.
<edit>
param = param - self.lrs[i] * grads[i]
, here no learning happens as well because the first param actually a new variable probably. And self.lrs.grad = None as well.
</edit>
Does anyone know a solution, i.e. getting gradients for the learning rates?