I am missing something here. if we zero out current gradient value with optimizer.zero_grad()
then how does update happen to weights? For an update to happen we should have values in weight so we can do something like weight-=lr*grad
?
Thanks.
I am missing something here. if we zero out current gradient value with optimizer.zero_grad()
then how does update happen to weights? For an update to happen we should have values in weight so we can do something like weight-=lr*grad
?
Thanks.
You call zero grad at the start of the mini-batch. If you do not your gradients accumulate. When you zero_grad only the gradients are zero’d out not the weights. Once you do a forward()
and loss.backward()
the gradients will be propagated. You can accumulate the gradients by not calling optimizer.zero_grad()
hmm, I see it is done before every backward
call:
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 2000 == 1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print('Finished Training')
here data
is one batch then?
Yes. As you can see zero_grad() is called before beginning a loop with the data.