I was waiting a ton of time for an epoch to complete, but failed to get to that point.
When I measured the time taken for each iteration it seems like time taken increases all the time.
It started from 0.0104 and after some seconds reached 0.103. Why does time keep increasing? Why is it so slow?
start = time.time()
optimizer.zero_grad()
loss.backward()
with torch.no_grad():
model.forward(K) #dummy pass
M1, M2, M3 = get_Ms()
model.linear1.weight.retain_grad()
model.linear1.weight.register_hook(lambda grad: torch.t(torch.mm(M1, torch.t(grad))))
model.linear2.weight.retain_grad()
model.linear2.weight.register_hook(lambda grad: torch.t(torch.mm(M2, torch.t(grad))))
model.linear3.weight.retain_grad()
model.linear3.weight.register_hook(lambda grad: torch.t(torch.mm(M3, torch.t(grad))))
optimizer.step()
end = time.time()
time_taken = end - start
print("time_taken=",time_taken)
I wrote this code as a replacement for the code below, which runs just fine (not fast, but reasonable). I expected to use hooks to use native optim backprop to speed up the computation, but instead the time completely exploded. Why so?
with torch.no_grad():
model.forward(K)
M1, M2, M3 = get_Ms()
model.linear1.weight -= lr * torch.t(torch.mm(M1, torch.t(model.linear1.weight.grad)))
model.linear2.weight -= lr * torch.t(torch.mm(M2, torch.t(model.linear2.weight.grad)))
model.linear3.weight -= lr * torch.t(torch.mm(M3, torch.t(model.linear3.weight.grad)))
model.linear1.bias -= lr * model.linear1.bias.grad
model.linear2.bias -= lr * model.linear2.bias.grad
model.linear3.bias -= lr * model.linear3.bias.grad