Gradient cannot be computed, help!

There is a similar problem that one encounters when implementing meta-learning algorithms as it too requires in-place operations of leaf variables (nn.Parameters specifically) because there need to be some intermediate parameter updates as part of the meta-learning algorithm. I was stuck in a similar problem a few weeks ago. The idea is to basically clone your leaf tensors using .clone() and use these new cloned variables for your computation so that gradients can still flow through these desired, leaf variables.

Why do we want to do it using .clone()?

Because the leaf tensors don’t keep any history/track of the operations applied to them and they also don’t allow in-place operations. If you want to understand this a little more, read my response to my own question. You can also take cues from inner_loop(self, task) function in this code or inner_loop(self,task) function of class SineMAML() in this code. You can even take a look at a slightly complicated method which uses hooks in this code (refer to class MetaLearner(object)).

P.S: Why is retain_graph=True ? You aren’t computing higher-order derivatives, right? It certainly doesn’t seem to be the case from the code snippet you have shared.

P.P.S: I may have given a slightly complex and convoluted answer but I thought that this added context might help you get a better idea of not only the problem you asked for but potentially for a problem that could arise in case you wanted to use higher-order derivatives (because I saw retain_graph=True in your code snippet). Please do ask for clarifications if my answer is not clear.

2 Likes