Gradient print during backward call

In below code, why the gradient prints number of times as epoch is increase?

import torch
import torch.nn as nn


class Net(torch.nn.Module):

    def __init__(self):       
        super(Net, self).__init__()
        self.w=torch.nn.Parameter(torch.ones([2,2]))
       
    def forward(self, x):
        out= x * self.w
        self.w.register_hook(lambda grad: print("GRADIENT OF  SELF.W IS"))
        self.w.register_hook(lambda grad:print(grad))
        return out


size1=2
learning_rate = 0.0005
net=Net()
criterion=torch.nn.L1Loss()
optimizer = torch.optim.SGD(net.parameters(), lr=learning_rate,momentum=0)
x=torch.ones(([1,1,size1,size1]))
target=torch.zeros(([1,1,size1,size1]))#,requires_grad=True)
target[0,0,:,:]=0.95
x.requires_grad=True

for i in range(0,5):
  print("*** Epoch number is ",i)
  x1=net(x)
  l=criterion(target,x1)
  l.backward()
  optimizer.step()
  optimizer.zero_grad()

Hi,

It’s because self.w is always the same Tensor, and every time you do a forward, you add another hook onto it.
These hooks don’t go away by them selfves so you just have more and more of them here.

Is it consume memory or some diverse effect if we don’t remove it?

No.
But if you add a new one at each iteration, it is expected that at iteration n, you will have n hooks being called.

I think what you want here is to move the hook registration to the __init__ of your nn.Module so that it is done only once. and so at every iteration, the hook will be called once.

1 Like

You just point out the right solution. I am not understood how defining in “init”, it calls during backward in every epoch? I just assumed till now “init_” doesn’t play role in forward or backward. It just instantiated once during model creation.

Your Tensor self.w is defined once during the init and then re-used for every forward in the forward method.
So if you add a hook to that Tensor, it will be taken into account at every forward as the Tensor is re-used at every forward.

In your code above, if you register the hook on out you won’t see this behavior anymore because out is a new Tensor at every forward. The behavior you observe here happens because self.w is the same Tensor that is re-used at every forward.

1 Like