Meta-learning using second-order derivative

M_R · December 21, 2021, 11:54am

Hi,
Does PyTorch support the second-order derivative? I am trying to run the following code:

student_parameters = [param for param in Student.parameters()]
teacher_params = [param for param in Teacher.parameters()]

psudo_labels = Teacher(inputs)

output = Student(inputs)

err1 = torch.nn.MSELoss()(output,psudo_labels)

err1.backward(retain_graph=True, create_graph=True,inputs=student_parameters)

optimizer_student.step()

output_new = Student(inputs2)

new_loss = torch.nn.MSELoss()(output_new, GT)

teacher_grad = grad(new_loss, teacher_params)

But I get the following Runtime error:

RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

AlphaBetaGamma96 · December 21, 2021, 2:35pm

You haven’t used the variable GT anywhere in your code so the derivative is equal to 0 by definition. That’s what the error code is stating “One of the differentiated Tensors appears to not have been used in the graph.” Are you using the correct variable here?

M_R · December 22, 2021, 5:14am

Sorry for not being clear. GT is a constant tensor that is defined somewhere beforehand. Think of it as a label or ground-truth.

AlphaBetaGamma96 · December 22, 2021, 11:04am

oh ok, so I made a mistake on your issue. The reason why you’re getting that error is that torch.autograd.grad is taking the gradient of new_loss which solely depends on Student with respect to the parameters of Teacher. This is equal to zero by definition as Student doesn’t contain teacher_parmas

M_R · December 25, 2021, 6:23am

Well that’s exactly my question.
I do “optimizer_student.step()”, which uses gradients computed from “err1 = torch.nn.MSELoss()(output,psudo_labels)”, which depends on the teacher’s parameters. The reason I am getting this error is because the optimizer optimizer_student updates the parameters of the student using in-place operations and so the teachers’parameters are broken from the computation graph at that point. I wonder how I can achieve this goal.

Ethel · December 27, 2021, 8:43pm

There is a lot of interesting literature on meta-learning with … The First-Order MAML ignores the second derivative part in red.
www rapidfs com activation