Default value of set_to_none in Adam's zero_grad


Is there a reason why the default value of set_to_none in Adam.zero_grad() isn’t changed to True? The current default results in parameter update even though they aren’t used to compute the loss. I suppose this also holds true for other similar optimisers. Setting it to True seems like what you would expect, no?

I think the set_to_none attribute is set to False by default for backward compatibility reasons.
Note that it’s a performance improvement (avoiding an accumulation kernel to add the new gradients to zeroes) and wasn’t introduces to dodge the side effect of parameter updates with a zero gradient for optimizers with momentum.