Yes they’re both the same (up to numerical precision) in the numerics.
They will have different runtime/memory tradeoff though.
See details here: Why do we need to set the gradients manually to zero in pytorch? - #20 by albanD
1 Like