Wrong value recorded in register_hooks

csailnadi · April 12, 2020, 1:41am

I am registering hooks to update convolutional layer in this way

for i, conv_layer in enumerate(model.convos):
       conv_layer.weight.retain_grad()
       print("conv_layer.weight.shape ", conv_layer.weight.shape)
       print("Ms[i].shape ", Ms[i].shape)
       h = conv_layer.weight.register_hook(lambda grad: convo_grad_mult(grad, Ms[i]))

The output of print statements are

conv_layer.weight.shape  torch.Size([17, 3, 3, 3])
Ms[i].shape  torch.Size([3, 3])
conv_layer.weight.shape  torch.Size([35, 17, 3, 3])
Ms[i].shape  torch.Size([17, 17])
conv_layer.weight.shape  torch.Size([9, 35, 3, 3])
Ms[i].shape  torch.Size([35, 35])

Then during the hook registration within function convo_grad_mult I print the sizes of arguments:

> M  torch.Size([35, 35])
> grad  torch.Size([35, 81])
> M  torch.Size([35, 35])  **#here I am getting the error since I expect M of size [17, 17]**
> grad  torch.Size([17, 315])

Why am I receiving the same M again while it is clear that I should have received M of size [17, 17]?

Here is full training loop.

EDIT:
Much simpler setting to reproduce the issue

def dummy(arg):
    print(arg)

…

for i, conv_layer in enumerate(model.convos):
      print("i={}".format(i))
      print("conv_layer.weight.shape ", conv_layer.weight.shape)
      h = conv_layer.weight.register_hook(lambda grad: dummy(i))
      hooks.append(h)

Output:

i=0
conv_layer.weight.shape  torch.Size([17, 3, 3, 3])
i=1
conv_layer.weight.shape  torch.Size([35, 17, 3, 3])
i=2
conv_layer.weight.shape  torch.Size([9, 35, 3, 3])

after calling backward() I get

2
2
2

Why does it only register last one?
Clearly I register hooks for different convo layers

If I do

h1 = model.convos[0].weight.register_hook(lambda grad: dummy(0))
h2 = model.convos[1].weight.register_hook(lambda grad: dummy(1))
h3 = model.convos[2].weight.register_hook(lambda grad: dummy(2))

Then I get correct output

2
1
0

Why it does not work with a loop?

ptrblck · April 12, 2020, 10:03am

Python uses lexical scoping so that the closure will use the name and scope of the passed variable i.
To fix this issue, you could pass i to the lambda as:

conv_layer.weight.register_hook(lambda grad, i=i: dummy(i))