Hi @ptrblck,
Regarding the fact that parameters are still updated even when the grad is zero (momentum, decay, etc), will setting the grads to none solve this issue? ie, optimizer.zero_grad(set_to_none=True)
?
Thanks!
Hi @ptrblck,
Regarding the fact that parameters are still updated even when the grad is zero (momentum, decay, etc), will setting the grads to none solve this issue? ie, optimizer.zero_grad(set_to_none=True)
?
Thanks!