Getting the validation loss while training

Hello! I want to get the validation loss after each epoch while training. I have this piece of code:

def train(model, train_loader, optimizer):
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.cuda(), target.cuda()
        output = model(data)

        loss = torch.sqrt(F.mse_loss(output, target))

        loss_validation = torch.sqrt(F.mse_loss(model(factors_val), product_val))

        if batch_idx%100 == 0:
    return(loss, loss_validation)

I have a batch size of 512 and 300 epoch. The test size has 250000 inputs and the validation set has 20000. The NN is a simple feed forward fully connected with 8 hidden layers. If I don’t use loss_validation = torch.sqrt(F.mse_loss(model(factors_val), product_val)) the code works fine. However, if I use that line, I am getting a CUDA out of memory message after epoch 44. The way I go through the epochs is this:

for epoch in range(1, epochs ):
    losses = losses + [train(model, train_loader, optimizer)]

All I want to do is to save the training and validation set after each epoch so I can plot it. Can someone tell me what am I doing wrong and how can I fix it? I am really new to pytorch so it is probably something stupid and obvious, but I really need some help. Thank you!


Hi there,
PyTorch dynamically generates the computational graph which represents the neural network. In short, PyTorch does not know that your validation set is a validation set.

In order not to compute the backward over the validation set you need to use

with torch.no_grad():
      validation_operations ...

It means torch shouldn’t keep tracking of gradients for those operations.

You also may want to use model.eval() which turns your model into evaluation mode, deactivating batch normalization layers nor dropout.

remember to set model.train() before computing training operations.

When you attach loss to the list, you are also picking the history of the loss, not only the numerical data. That’s why you get out of memory after certain amonut of iterations.

Use loss += loss.item() to catch just the number.


I just wanted to know if it is mandatory to use model.eval() if we are calculating validation loss after each epoch?

Yes, when calling .eval(), your dropout layers(if you have any) are deactivated in a manner of speaking, as it should be the case in the testing phase, so yes the .eval() must be called…

is model.eval() setting the modell in an equal state comparable with “with torch.no_grad()”?

No, model.eval() and torch.no_grad() will not have the same effect.
While model.eval() changes the behavior of some layers (e.g. dropout will be disabled and the running stats of batchnorm layers will be used), with torch.no_grad() disables the gradient calculation.


Hi I am using torch.nn.LSTM for time series. I wonder if I dont have dropout and batch norm , is it necessary to use model.train() and model.eval() ?

Not sure it’s mandatory, but I’m pretty sure it’s a best practice. In fact, I wonder if it might reduce computational overhead if you do call vs not? Time it and see? I’m interested to see how that would turn out. If it saves time, I can’t think of any reason why you wouldn’t call it…

Any experts want to weigh in?? I’m new to this stuff myself. :wink:

isn’t eval also meant not to run backpropagation and only run the feed forward algortihm of the model?
in other words, torch.no_grad() will also be launched once you call model.eval()

This would be the common use case, yes.
However, model.eval() changes the behavior of some modules during training and validation, while torch.no_grad() disables the gradient calculation, and some use cases treat these two options independently.
E.g. you might want to leave dropout layers enabled during validation to and create multiple (noisy) predictions for the same input samples etc.

Sorry if this is a lame question.
But what happens to the forward pass value when we do a validation forward pass?
To elaborate my question, because we need the forward pass value when updating the weights, and we only make gradients zero (before the optimizer.step()), can validation forward pass value mess up with the weight update? or the model parameters only remember the last forward pass value so the validation forward pass is forgotten when we do the update (because we do the train forward pass later)?

If you are not wrapping the validation forward pass into with torch.no_grad(), the computation graph will be created, which would be needed to calculate the gradients. Unless you are calling backward() on the output or loss, no gradients will be calculated and the model will thus not be changed. As long as you hold a valid reference to the model output the computation graph will be kept alive.

1 Like

loss plot

why is validation loss so low in the first epoch and it doesn’t change after that?
Please help me with this.

Thank you