While I understand why this design decision was made, are there any plans to make it easier to save the gradients of intermediate variables? For example, it’d be nice if something like this was supported:
from torch.autograd import Variable
import torch
xx = Variable(torch.randn(1,1), requires_grad = True)
yy = 3*xx
yy.require_grad = True # <-- Override default behavior
zz = yy**2
zz.backward()
# do something with yy.grad
It seems like it’d be easier to let variables keep track of their own gradients rather than having to keep track of them with my own closures. Then if I want to analyze the gradients of my variables (leaf or not), I can do something like
do_something_with_data_and_grad_of(xx)
do_something_with_data_and_grad_of(yy)
Also, it might be useful to be able to set require_gradients
for intermediate variables. For example, I might want to plot a histogram of intermediate variable gradients while not needing gradients for upstream variables. Right now, I’d have to set therequire_gradients
flag True to upstream nodes just to make sure that the gradients for this intermediate node are computed, but that seems a bit wasteful.