No need to call
actual_loss, in fact no need to calculate it as it ain’t affecting training.
Though in your case I think the training loop should look like -
accumulations = 2
scaled_loss = 0
epochs = 20
training_steps_losses = 
for epoch in range(epochs):
for idx, (data, target) in train_loader:
out1 = network(input)
loss1 = criterion(out1, target1)
total_loss = loss1.item()
out2 = network(input)
loss2 = criterion(out2, target)
total_loss = loss2.item()
total_loss /= accumulations
# Here you will calculate gradients.
# In usual case we call optimizer.step() right after this. But not in this case.
# We are dividing the total_loss by accumulations in order to have same scale of gradients
# before calling optimizer.step()
scaled_loss += total_loss.item() # not required for training, its only used to monitor loss as we update the parameters
# In this case we will only call optimizer.step() when batch index (idx) + 1
# is divisible by accumulations.
# The main idea is we call .backward() for accumulations number of times,
# doing this adds gradients for all the parameters #(since we are not calling optimizer.zero_grad() every time we call total_loss.backward())
# for accumulations number of times.
# And after that we call optimizer.step_grad() followed by optimizer.zero_grad()
if (idx + 1) % accumulations == 0:
training_steps_losses.append(scaled_loss) # no need to divide scaled_loss here since we are already scaling the total_loss via dividing it by accumulations.
scaled_loss = 0.0
# And after training is done you can plot a graph between training iterations and losses:
plt.plot(training_steps_losses, label = 'Training Loss') # here only is the use of scaled_loss
Once again, calculate loss and parameters gradients for accumulations number of next batches, add those gradients of parameters respectively. And then update the parameters.
And sorry for not including these explanations in the first place.