Hi, I think that setting the grad to nan instead of 0 would solve the momentum problem
https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
Hi, I think that setting the grad to nan instead of 0 would solve the momentum problem
https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html