# On running loss and average loss

[ 1. Context ]
I’m following Udacity’s tutorial Intro to Deep Learning with PyTorch. In Lesson 5 on Convolutional Neural Networks by Cezanne Camacho Step 10 Training the Network, Cezanne used the code as quoted below to calculate the training loss and validation loss.

[ 2. Question ]
Although Cezanne has explained in the video, I’m still not clear why she was using `train_loss += loss.item()*data.size(0)` to aggregate the total training loss and why she was using `train_loss = train_loss/len(train_loader.sampler)` to calculate the average training loss with criterion being `nn.CrossEntropyLoss()`, whereas in the earlier Fashion-MNIST tutorial in the same series they were coded as `running_loss += loss.item()` and `running_loss/len(trainloader)` with criterion being `nn.NLLLoss()`.

[ 3. My understanding ]

1. Following Andrew Ng’s distinguishing between “cost: difference between prediction and target for each sample” and “loss: difference between prediction and target for the entire sample set”.
2. `loss.item()` is the value of “total cost, or, sum of target*log(prediction)” averaged across all training examples of the current batch, according to the definition of cross entropy loss.
3. Therefore, `loss.item()*data.size(0)` is the “total loss of the current batch (not averaged)”.
4. And, `train_loss` accumulates these “total loss per batch” for the entire epoch, i.e. “total loss of the current epoch”.
5. Finally, `train_loss = train_loss/len(train_loader.sampler)` calculates the “cost/loss averaged across all training examples for the current epoch

May I ask if the interpretation above is correct?

``````# specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()

# specify optimizer (stochastic gradient descent) and learning rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# number of epochs to train the model
n_epochs = 50

# initialize tracker for minimum validation loss
valid_loss_min = np.Inf # set initial "min" to infinity

for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
valid_loss = 0.0

###################
# train the model #
###################
model.train() # prep model for training
# clear the gradients of all optimized variables
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*data.size(0)

######################
# validate the model #
######################
model.eval() # prep model for evaluation
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update running validation loss
valid_loss += loss.item()*data.size(0)

# print training/validation statistics
# calculate average loss over an epoch
The second approach of dividing the averaged batch loss by the number of batches would yield the same result, if each batch in the epoch contains `batch_size` samples. This might not always be the case, if the length of the dataset is not divisible by the `batch_size` without a remainder. The last batch would thus contain less samples and the loss calculation would introduce a small bias.