The created tensor, does it have grad when loss.backward()?

Zichun_Zhang · December 18, 2018, 2:29pm

prop_f = torch.empty([b,c,h,w], device='cuda', requires_grad = False)

prop_f is a tensor that I created , so in theory, it is a leaf.
after that,

prop_f[bb,:, hh, ww] = product

I use prop_f to store some important intermedian variable tensor that are in the computation graph.
So, after that, is prop_f requires grad now? and could prop_f backward grad when loss.backward()
PS. loss is the loss of the whole model

albanD · December 18, 2018, 8:24pm

Hi,

Yes it’s a leaf but does not require gradients when created. So no .grad will be saved for it.
Why do you want to do this? Why not use .retain_grad() on product directly. Then you can access the .grad attribute on it.

Zichun_Zhang · December 19, 2018, 1:52am

Thx aldanD, @albanD

def propagate(self, kernels,b,c,h,w):
        prop_f = torch.empty([b,c,h,w], device='cuda', requires_grad = False)  # size [1, 256, 24, 24]

        for bb in range(b):  #normally b==1, ignore this for loop
            for hh in range(h):   #24
                for ww in range(w):   #24
                    product = self.old_f[ bb,:,hh+p, ww+q ] * kernels[ bb, hh*w+ww, p, q ]   # size [1, 256, 1, 1]

                    prop_f[bb,:, hh, ww] = product   # move product into the right pos of prop_f to store
        return prop_f

You see, my purpose here is that: the product is of size [ 1, 256, 1 ,1 ], and prop_f is of size [ 1, 256, 24, 24],
so I am going to store every tmp product in the right place of prop_f. So totally I am like processing a sliding window ‘convolution’ fashion. Not as you said,My purpose is not to get the grad of either of them, but to get the silding ‘convolution’ result.

So, intuitively I have to create a container to store all those tmp product in right [hh , ww] position, so that the ‘convolution’ result could be saved and returned. And , I choose torch.empty to do this

Finally, My question is that as you can see , at the beginning the prop_f has no grad, since it’s leaf when created. But after this whole ‘convolution’ above, does it has grad now?, since all its elements are obtained from graph node calculations, such as the calculation between computation graph node self.old_f and kernels

Thx for reading this whole

albanD · December 19, 2018, 10:19am

Hi,

Is that expected above that p and q are constants?

As a first point, even if you make this work, your code is going to be very slow as nested python for loops are not efficient. You should try and write your function as operations directly on the full matrices.

In your case here, the .grad attribute won’t be set. As the original tensor was a leaf but after doing inplace ops on it, it is not a leaf anymore. So the gradients will propagate as expected to self.old_f but the .grad attribute of prop_f won’t be set.

Zichun_Zhang · December 19, 2018, 10:45am

Hi, albanD @albanD,

First, the p and q is just for simplistic expression, I use other variable to replace them in the real code, so dont worry about this, my code now could run successfully, but maybe slow

Second, Is there any better method I could try to avoid this for loops mode? My purpose is not like real convolution, but to convolute different kernels at different position[hh, ww], so normal convolution operation might not suitable for me

Finally, did you mean that prop_f itself won’t have grad after all this? but the grad from deeper layer will still propagate to self.old_f and kernels right? and then adjust self.old_f and kernels w.r.t. corresponding layers who generate them?

THX!! albanD

albanD · December 21, 2018, 8:41pm

Hi,

Yes the gradients will flow back properly to the layers above.

For convolution-like operations, you might want to check the im2col function. It might help you speed up your code depending on what you want.