What happens if loss.backwards() is called multiple times without optimizer.step() ?
How would gradients be updated? Sum of gradients for each backwards call?
Yes, gradients are accumulated (summed).
1 Like