I would like to implement a new torch.autograd.Function where the gradient is closely related to an intermediate result. Is there a way to store the intermediate result for the backward pass to avoid having to compute it again? (Similar to save_for_backward, but that explicitly isn’t it…)

Why not save_for_backward? it’s there for precisely this purpose.
If not you can also just assign the intermediate result Tensor to self self.intermediate = intermediate_result

The autograd system is designed to handle intermediate results, but only if they were formulated using torch.autograd.Variable() and torch tensor functions, afaik. The error is because one of the arguments is not a Variable.

The question is bit outdated, I encountered the same situation

It seems that only arguments of forward method can be saved.
intermediate result (tensor) failed with message below.
am i right? is it okay to save tensor like self.intermediate = intermediate_result in terms of performance ?

My understanding is that it is safe to save things yourself by assigning to (harmlessly named) members of self (ctx for new style autograd).
In the new style you need to wrap it yourself to get a Variable.

Unfortunately, I don’t see where this requirement (store only input or output) comes from and how do I get around it (storing as a member in Function means you can use this function only once).

Using the autograd.Function instance only once is a great way to do this. In fact this is what used to happen all the time when you used operations on variables and is the right thing to do™.

If you absolutely dislike autograd.Function instances or - like myself - like to feel modern, go for the new-style autograd.Function and change the line with the function application to y_pred = MyReLU.apply(x.mm(w1)).mm(w2). You do not need relu when doing this. This will store inputs and other stuff in the context ctx and is what happens nowadays when you use operations on variables.

Using the autograd.Function instance only once is a great way to do this. In fact this is what used to happen all the time when you used operations on variables and is the right thing to do™.

Ok, probably it is a way to go, where do I find an example with instantiating function?

If you absolutely dislike autograd.Function instances or - like myself - like to feel modern, go for the new-style autograd.Function and change the line with the function application to y_pred = MyReLU.apply(x.mm(w1)).mm(w2). You do not need relu when doing this. This will store inputs and other stuff in the context ctx and is what happens nowadays when you use operations on variables.

Values that I need to store are actually integer numbers that occur during forward pass, so I can’t follow this recipe.