I am training a deep network and because the training data is very big, so I can’t feed model with a bigger batch size. I wander if I can feed each time with a small batch size and optimize the model once after a specified times of loss backward?
If possible, how can i do to average the gradient values before update parameters?
Thank you!!!
1 Like
num_batches = 0
for sample, target in dataset:
out = model(sample)
loss = loss_fn(out, target)
loss.backward()
num_batches += 1
if num_batches == 10: # optimize every 10 mini-batches
optimizer.step()
model.zero_grad() # or optimizer.zero_grad()
num_batches = 0
2 Likes
Hi, smth
It seems backward function only accumulates the gradient, but does not average the gradient?