A concern that gradients are deleted before the end of the backpropagation backward calculation

ortasa · July 9, 2018, 8:10am

Hi,

I have a very large network. Siamese architecture and 30,262,656 parameters. I’ve been trying for a long time to train it without success. The loss start at 1 and stay more or less constant.

When I started to print the gradients everything start working. The loss dropped to zero. The network started to learn.

loss.backward()
optimizer.step()
print(rank_module.last_layer[0].weight.grad)
optimizer.zero_grad()

I think that the zero_grad() action delete all the gradients without checking if the step() action is done.

I would appreciate it if you could look at it.

Thanks,
Ortal

ptrblck · July 9, 2018, 8:16am

Without the print statement your model isn’t learning at all?
Once you add print to your training loop, the loss goes down to zero?

Have you tried to fix your random seeds and run it again with and without the print?
Is this issue reproducible on your machine?
The print statement should not change anything in your training.

ortasa · July 9, 2018, 6:18pm

Yes you are right it was a hyperparameter that made the change and not the print.
Sorry.
Ortal