The created tensor, does it have grad when loss.backward()?

prop_f = torch.empty([b,c,h,w], device='cuda', requires_grad = False)

prop_f is a tensor that I created , so in theory, it is a leaf.
after that,

prop_f[bb,:, hh, ww] = product

I use prop_f to store some important intermedian variable tensor that are in the computation graph.
So, after that, is prop_f requires grad now? and could prop_f backward grad when loss.backward()
PS. loss is the loss of the whole model


Yes it’s a leaf but does not require gradients when created. So no .grad will be saved for it.
Why do you want to do this? Why not use .retain_grad() on product directly. Then you can access the .grad attribute on it.

Thx aldanD, @albanD

def propagate(self, kernels,b,c,h,w):
        prop_f = torch.empty([b,c,h,w], device='cuda', requires_grad = False)  # size [1, 256, 24, 24]

        for bb in range(b):  #normally b==1, ignore this for loop
            for hh in range(h):   #24
                for ww in range(w):   #24
                    product = self.old_f[ bb,:,hh+p, ww+q ] * kernels[ bb, hh*w+ww, p, q ]   # size [1, 256, 1, 1]

                    prop_f[bb,:, hh, ww] = product   # move product into the right pos of prop_f to store
        return prop_f

You see, my purpose here is that: the product is of size [ 1, 256, 1 ,1 ], and prop_f is of size [ 1, 256, 24, 24],
so I am going to store every tmp product in the right place of prop_f. So totally I am like processing a sliding window ‘convolution’ fashion. Not as you said,My purpose is not to get the grad of either of them, but to get the silding ‘convolution’ result.

So, intuitively I have to create a container to store all those tmp product in right [hh , ww] position, so that the ‘convolution’ result could be saved and returned. And , I choose torch.empty to do this

Finally, My question is that as you can see , at the beginning the prop_f has no grad, since it’s leaf when created. But after this whole ‘convolution’ above, does it has grad now?, since all its elements are obtained from graph node calculations, such as the calculation between computation graph node self.old_f and kernels

Thx for reading this whole


Is that expected above that p and q are constants?

As a first point, even if you make this work, your code is going to be very slow as nested python for loops are not efficient. You should try and write your function as operations directly on the full matrices.

In your case here, the .grad attribute won’t be set. As the original tensor was a leaf but after doing inplace ops on it, it is not a leaf anymore. So the gradients will propagate as expected to self.old_f but the .grad attribute of prop_f won’t be set.

Hi, albanD @albanD,

First, the p and q is just for simplistic expression, I use other variable to replace them in the real code, so dont worry about this, my code now could run successfully, but maybe slow :tired_face:

Second, Is there any better method I could try to avoid this for loops mode? My purpose is not like real convolution, but to convolute different kernels at different position[hh, ww], so normal convolution operation might not suitable for me

Finally, did you mean that prop_f itself won’t have grad after all this? but the grad from deeper layer will still propagate to self.old_f and kernels right? and then adjust self.old_f and kernels w.r.t. corresponding layers who generate them?

THX!! albanD


Yes the gradients will flow back properly to the layers above.

For convolution-like operations, you might want to check the im2col function. It might help you speed up your code depending on what you want.

1 Like