Fail to re-create computational graph in a for-loop


(Ecolss) #1

I have a variable a and a bunch of functions f_k(a), so I create tensor to hold the results of all these functions, each time when a function is computed, I also need to compute the gradient for this function, so here is what I did,

import torch as th

k = 2
a = th.tensor(2.0, dtype=th.float32, requires_grad=True)  # variable 
b = th.zeros(k, dtype=th.float32)  # tensor to hold results of the k functions
c = th.tensor(0.0, dtype=th.float32)

### this for-loop is just a dummy demo when I don't use a tensor to hold results
for i in range(k):
    c = a*i
    g = th.autograd.grad(c, a)   # works fine
    print(g)


### this is what I wanted to do
for i in range(k):
    b[i] = a*i
    g = th.autograd.grad(b[i], a)  # FAIL when i=1, won't create graph again for i=1?
    print(g)

My code doesn’t work, and I expected that each loop will created a new graph for b[i], and that’s valid for me to do a grad operation, but it turns out to be false, why??


(Alban D) #2

Hi,

You should not make b a Tensor. Because that means that when you are at i=1, b still refer to the computations done during i=0 and so calling backward will try to backward through the graph of i=0 again.
I would just use a list as b and you won’t have any problem.
If you need to ever compute gradients through each f_k, you need to pass retain_graph=True to the grad function.
If you want to use b as a single Tensor after the loop, just call torch.stack or torch.cat on the list b that will contain all your results !


(Ecolss) #3

Yes, no problem at all if b is a list, I just want to avoid list, since calling torch.cat or stack might also bring in a little overhead (may marginal) of creating a tensor from a list, right?


(Alban D) #4

Well you create the Tensor here yourself anyway :wink:
And the caching allocator is good enough that if you repeat the operation at each forward, the allocation is for free.