@tom, since it is possible to accumulate the loss of several minibatch and do one parameter update. For example I want to update the parameter every 64 minibatch, I have the following code
total_loss = Variable(torch.zeros(1), requires_grad=True)
for idx, (data, target) in train_loader:
data, target = Variable(data), Variable(target)
output = model(data)
loss = criterion(output, target)
total_loss = total_loss + loss
if (idx+1)%64 == 0:
total_loss = total_loss/(64*batchsize)
total_loss.backward()
optimizer.step()
optimizer.zero_grad()
total_loss = Variable(torch.zeros(1), requires_grad=True)
Is the above code correct to achieve the desired effect?