Why computing time increase with backpropagation hooks?

csailnadi · April 3, 2020, 6:18pm

I was waiting a ton of time for an epoch to complete, but failed to get to that point.
When I measured the time taken for each iteration it seems like time taken increases all the time.
It started from 0.0104 and after some seconds reached 0.103. Why does time keep increasing? Why is it so slow?


            start = time.time()
            optimizer.zero_grad()
            loss.backward()

            with torch.no_grad():
                model.forward(K) #dummy pass
                M1, M2, M3 = get_Ms()

                model.linear1.weight.retain_grad()
                model.linear1.weight.register_hook(lambda grad: torch.t(torch.mm(M1, torch.t(grad))))

                model.linear2.weight.retain_grad()
                model.linear2.weight.register_hook(lambda grad: torch.t(torch.mm(M2, torch.t(grad))))

                model.linear3.weight.retain_grad()
                model.linear3.weight.register_hook(lambda grad: torch.t(torch.mm(M3, torch.t(grad))))

            optimizer.step()

            end = time.time()
            time_taken = end - start
            print("time_taken=",time_taken)

I wrote this code as a replacement for the code below, which runs just fine (not fast, but reasonable). I expected to use hooks to use native optim backprop to speed up the computation, but instead the time completely exploded. Why so?

with torch.no_grad():
                    model.forward(K)
                    M1, M2, M3 = get_Ms()

                    model.linear1.weight -= lr * torch.t(torch.mm(M1, torch.t(model.linear1.weight.grad)))
                    model.linear2.weight -= lr * torch.t(torch.mm(M2, torch.t(model.linear2.weight.grad)))
                    model.linear3.weight -= lr * torch.t(torch.mm(M3, torch.t(model.linear3.weight.grad)))

                    model.linear1.bias -= lr * model.linear1.bias.grad
                    model.linear2.bias -= lr * model.linear2.bias.grad
                    model.linear3.bias -= lr * model.linear3.bias.grad

csailnadi · April 3, 2020, 6:53pm

I found the answer myself. Maybe it will help someone.
After creating a hook it should be removed or it will pile up.
See here

So after optimizer.step()
I call h1.remove() where h1 is the returned value of register_hook() function.