Hello,
In my project, I need to compute the gradient wrt the input for each output, basically by doing something like this:
for index in selected_outputs:
output[:,index].backward(retrain_variables=True)
# the code continues from here
The batch size at the input is only one here. I observed that each backward() takes about the same time in each iteration and causing the loop to take a lot of time. Is there a better way to do this?
That depends on what you do with the gradients after.
If you need a linear combination of them, you can apply it before and then backward once.
If you need something more complex, I’m afraid you have to call multiple backwards
Can I ask one more question?
If I understand correctly from here, the backward creates a directed graph that will do the computations and give the output and I think that even though retain_variables are set True, the graph is not retained. Now the docs page of autograd show that there will a retain_graph option in the new version of backward(I think). Do you know if that will help? I know I still have to call multiple backwards, but will that make consecutive backwards quicker?
The graph is actually created while forwarding, what the backward is doing is just traversing it.
The difference between retain_variables (or retain_graph which is its new name) being True and False is that in the first case, the graph will be left unchanged during backward, in the second case, the graph will be destroyed as you go through it (to reduce memory usage) and thus you won’t be able to reuse it.