Calculating training loss - sum of avg batch losses or avg loss across all samples

Hi,

how is loss calculated during training,

is it batchwise average or overall data average?

Typically the loss is calculated for each batch separately.

In other words, inside your training loop that iterates over batches, you have these commands:

optimizer.zero_grad()
loss = calculate_loss(*args)
loss.backward()

So loss is getting overwritten for each batch, which is the same cadence with which the optimizer steps in the gradient direction.