[ 1. Context ]
I’m following Udacity’s tutorial Intro to Deep Learning with PyTorch. In Lesson 5 on Convolutional Neural Networks by Cezanne Camacho Step 10 Training the Network, Cezanne used the code as quoted below to calculate the training loss and validation loss.
[ 2. Question ]
Although Cezanne has explained in the video, I’m still not clear why she was using train_loss += loss.item()*data.size(0)
to aggregate the total training loss and why she was using train_loss = train_loss/len(train_loader.sampler)
to calculate the average training loss with criterion being nn.CrossEntropyLoss()
, whereas in the earlier Fashion-MNIST tutorial in the same series they were coded as running_loss += loss.item()
and running_loss/len(trainloader)
with criterion being nn.NLLLoss()
.
[ 3. My understanding ]
- Following Andrew Ng’s distinguishing between “cost: difference between prediction and target for each sample” and “loss: difference between prediction and target for the entire sample set”.
-
loss.item()
is the value of “total cost, or, sum of target*log(prediction)” averaged across all training examples of the current batch, according to the definition of cross entropy loss. - Therefore,
loss.item()*data.size(0)
is the “total loss of the current batch (not averaged)”. - And,
train_loss
accumulates these “total loss per batch” for the entire epoch, i.e. “total loss of the current epoch”. - Finally,
train_loss = train_loss/len(train_loader.sampler)
calculates the “cost/loss averaged across all training examples for the current epoch”
May I ask if the interpretation above is correct?
# specify loss function (categorical cross-entropy)
criterion = nn.CrossEntropyLoss()
# specify optimizer (stochastic gradient descent) and learning rate = 0.01
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
# number of epochs to train the model
n_epochs = 50
# initialize tracker for minimum validation loss
valid_loss_min = np.Inf # set initial "min" to infinity
for epoch in range(n_epochs):
# monitor training loss
train_loss = 0.0
valid_loss = 0.0
###################
# train the model #
###################
model.train() # prep model for training
for data, target in train_loader:
# clear the gradients of all optimized variables
optimizer.zero_grad()
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# backward pass: compute gradient of the loss with respect to model parameters
loss.backward()
# perform a single optimization step (parameter update)
optimizer.step()
# update running training loss
train_loss += loss.item()*data.size(0)
######################
# validate the model #
######################
model.eval() # prep model for evaluation
for data, target in valid_loader:
# forward pass: compute predicted outputs by passing inputs to the model
output = model(data)
# calculate the loss
loss = criterion(output, target)
# update running validation loss
valid_loss += loss.item()*data.size(0)
# print training/validation statistics
# calculate average loss over an epoch
train_loss = train_loss/len(train_loader.sampler)
valid_loss = valid_loss/len(valid_loader.sampler)