Hi, I think that setting the grad to nan
instead of 0
would solve the momentum problem
https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html
Hi, I think that setting the grad to nan
instead of 0
would solve the momentum problem
https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html