How to calculate the validation loss during each epoch of training?

Hi,

Question: I am trying to calculate the validation loss at every epoch of my training loop.

I know there are other forums about this, but I don’t understand what they are saying. I am using Pytorch geometric, but I don’t think that particularly changes anything.

My code: This is what I have currently done (this is some code from within my training function)

# Create lists to store the training and validation loss information
    train_model_training_loss_ls = []
    train_model_training_accuracy_ls = []
    validation_model_training_ls = []

    # train here...    
    for epoch in range(1, n_epochs+1):    
        model.train()
        optimizer.zero_grad()                                                   # Clear gradients.
        out = model(_data_.x, _data_.edge_index, _data_.edge_weight)            # Perform a single forward pass.
        loss = criterion(out[_data_.train_mask], _data_.y[_data_.train_mask])   # Compute the loss solely based on the training nodes.

        # Get validation losses
        validation_loss_local_train_fn = criterion(out[_data_.validation_mask], _data_.y[_data_.validation_mask])
        
        loss.backward()                                                   # Derive gradients.
        optimizer.step()                                                  # Update parameters based on gradients.

        # Append training and validation loss to lists
        train_model_training_loss_ls.append(loss)
        validation_model_training_ls.append(validation_loss_local_train_fn)

        # Print information out
        if epoch % 50 == 0:
          print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

My specific queries are:

  • Have I put the validation loss calculation in the right location?
  • Have I calculated the validation loss in the correct way? Lots of examples online seem to have a line like: loss.item() * data.size(0)

[EDIT #1]: This is just for the calculation of the LOSS. I have a separate function to calculate accuracies which has the model.eval() line, but do I need it here as well?

Thanks in advance.

Usually you would call model.eval(), pass the validation data to the model, create the predictions, and calculate the validation loss.
In your current code snippet it seems you are reusing the out tensor which was created with the training input data and index them with a validation mask?
I’m not familiar with your use case, but this approach sounds a bit strange, so could you explain this use case a bit more?

Hi,

Many thanks for taking the time to reply - it is much appreciated. I have since updated the code a bit (but I think the same error is persisting), and will show what I have.

Responding to your post:

  1. Use case: I basically just want training and validation loss curves that can be plotted (with data from each epoch)

  2. model.eval() : yes this makes sense, and has since been included in my ‘newer’ code. Please see below

  3. Reusing the out tensor: now that you mention it, I am not exactly sure why I have done that. I thought out was the output of the last layer of my GNN (which does binary classification). Therefore, to get the training loss I pass in the output and the ground truth. However, I only want to use the training data to calculate the loss, so I use the training mask to reference the training ids. Similarly when calculating the validation loss, I only want to use the validation data in that calculation. Is there something wrong with doing that?

Updated form of the code:

# Create lists to store the training and validation loss information
    train_model_training_loss_ls = []
    train_model_training_accuracy_ls = []
    validation_model_training_loss_ls = []
    validation_model_training_accuracy_ls = []

    # train here...    
    for epoch in range(1, n_epochs+1):    
        model.train()
        optimizer.zero_grad()                                                   # Clear gradients.
        out = model(_data_.x, _data_.edge_index, _data_.edge_weight)            # Perform a single forward pass.
        loss = criterion(out[_data_.train_mask], _data_.y[_data_.train_mask])   # Compute the loss solely based on the training nodes.
        loss.backward()                                                             # Derive gradients.
        optimizer.step()                                                        # Update parameters based on gradients.

        # Append training and validation loss to lists
        train_model_training_loss_ls.append(loss.item())

        # Obtain training accuracy curve
        training_accuracy_local = calculate_training_accuracy(model, data=_data_)
        train_model_training_accuracy_ls.append(training_accuracy_local)

        # Print information out
        if epoch % 50 == 0:
          print(f'Epoch: {epoch:03d}, Loss: {loss:.4f}')

        # Get validation losses
        model.eval()
        validation_loss_local_train_fn = criterion(out[_data_.validation_mask], _data_.y[_data_.validation_mask])
        validation_model_training_loss_ls.append(validation_loss_local_train_fn.item())

        # validation accuracy by testing the model on the validation fold
        validation_accuracy = validate(model, data = _data_)
        validation_model_training_accuracy_ls.append(validation_accuracy)

Thanks for the help and please let me know if I need to explain things further.

[EDIT #1]: Sorry, forgot to respond to your question about my desire to get the accuracies. I thought it would be a good idea for me to obtain accuracy (on training and validation sets) curves to plot out as well. Sometimes online, I see the train & val loss curves and sometimes I see the train & val accuracy plots (i.e. how the accuracy of the model on the respective datasets is with each epoch).

Yes, this makes sense and I’m not concerned about it.

This would also be my assumption.
Based on your code you are then passing an input batch containing training and validation data to the model and index the output to split it into the training and validation output?
This would still be strange, as the common use case would be:

# train model
model.train()
train_input, train_target = ...
train_output = model(train_input)
train_loss = criterion(train_output, train_target)
# optimization
...

# vaidation
model.eval()
val_input, val_target = ...
val_output = model(val_input)
val_loss = criterion(val_output, val_target)

Thank you for your reply. Regarding your last point, yes I am passing in an input batch containing ALL the data: training, validation, and testing data. This is because I am doing semi-supervised classification / transductive learning, whereby we have access to all the feature information for the whole dataset, but only the labels of the training set.