Hi,
how is loss calculated during training,
is it batchwise average or overall data average?
Hi,
how is loss calculated during training,
is it batchwise average or overall data average?
Typically the loss is calculated for each batch separately.
In other words, inside your training loop that iterates over batches, you have these commands:
optimizer.zero_grad()
loss = calculate_loss(*args)
loss.backward()
So loss
is getting overwritten for each batch, which is the same cadence with which the optimizer steps in the gradient direction.