Are there two valid Gradient Descent approaches in PyTorch?

albanD (Alban D) December 16, 2024, 4:07pm 2

Yes they’re both the same (up to numerical precision) in the numerics.
They will have different runtime/memory tradeoff though.
See details here: Why do we need to set the gradients manually to zero in pytorch? - #20 by albanD

1 Like