Difference between 2 methods for accumulate gradients

Is there any difference with these 2 methods for accumulate gradients??

  1. accumulate with averaged loss
accum_loss = 0

for _ in range(10):
	out = model(x)
	loss = get_loss(out, y)
	accum_loss += loss
	
optimizer.zero_grad()
accum_loss /= 10
accum_loss.backward(retain_graph=True)
optimizer.step()	
  1. accumulate with autograd’s bakward function.
optimizer.zero_grad()
for _ in range(10):
	out = model(x)
	loss = get_loss(out, y)
	loss.backward()

optimizer.step()	

help me please…

These are the differences I could see between the 2 methods of accumulating gradients:

  1. Method1 uses more memory as it keeps the computation graphs of all 10 iterations in memory.
    Where as, Method2 is memory efficient

  2. In Method1, you are dividing gradient by 10 (i.e., averaging for 10 iterations). In Method2, there is no averaging taking place.

1 Like

Thank you very much!