Did you mean interface like optimizer.zero_grad() or t.grad().zero_()? But What if my backward is an one pass operation?
I mean for code below, it’s better that the gradient of t1 is ([1.]), though t1 actually contributes to two
independent elements in t2. Because in many cases, different index of tensor are independent or parallel for calculation.
Both t1.grad.zero_() and optimizer.zero_grad() will zero out the gradient.
If you don’t call any of those then t1.grad would accumulate over backward passes. If you have just 1 backward pass then t1.grad would be just the gradient for that 1 backward pass. If you call backward again (without zeroing out the grad) then t1.grad would be the old value plus the new value and so on.