Are there two valid Gradient Descent approaches in PyTorch?

Yes they’re both the same (up to numerical precision) in the numerics.
They will have different runtime/memory tradeoff though.
See details here: Why do we need to set the gradients manually to zero in pytorch? - #20 by albanD

1 Like