You haven’t used the variable GT anywhere in your code so the derivative is equal to 0 by definition. That’s what the error code is stating “One of the differentiated Tensors appears to not have been used in the graph.” Are you using the correct variable here?

oh ok, so I made a mistake on your issue. The reason why you’re getting that error is that torch.autograd.grad is taking the gradient of new_loss which solely depends on Student with respect to the parameters of Teacher. This is equal to zero by definition as Student doesn’t contain teacher_parmas

Well that’s exactly my question.
I do “optimizer_student.step()”, which uses gradients computed from “err1 = torch.nn.MSELoss()(output,psudo_labels)”, which depends on the teacher’s parameters. The reason I am getting this error is because the optimizer optimizer_student updates the parameters of the student using in-place operations and so the teachers’parameters are broken from the computation graph at that point. I wonder how I can achieve this goal.

There is a lot of interesting literature on meta-learning with … The First-Order MAML ignores the second derivative part in red. www rapidfs com activation