Torch.no_grad() in Validation Set

Below is a code snippet that I am implementing:-

def loss_fn(Y_prob, Y_true):
    criterion = nn.BCELoss()
    loss = criterion(Y_prob, Y_true)
    return loss
def loss_for_batch(model, X, Y_true, optimizer=None):
    Y_prob = model(X)
    loss = loss_fn(Y_prob, Y_true)
    if optimizer is not None:
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()
    return (loss.item())*X.shape[0])
def train():
    .
    .
    .
    model.train()
    for X_train, Y_train in train_dl:
        train_loss_batch = loss_for_batch(
                    model, X_train, Y_train, optimizer)
         .
         .
         .

     model.eval()
     for X_val, Y_valin train_dl:
     val_loss_batch = loss_for_batch(
                    model, X_train, Y_train, optimizer)     

My question is:-
Since I haven’t used torch.no_grad() for backward calculation and I am aware of the fact that I should do it considering the computational constraints. But, even if I am not doing it, will it affect my model performance?
My doubt is when I am iterating through the val_dl, the computation graph which is being made while iterating through the val_dl will have the loss variable too, which is having requires_grad set to True because I haven’t used torch.no_grad(), will this graph affect the “next” graph and gradients which will be created on the next iteration through train_dl and backpropagated through loss.backward?
Is the loss tensor being created is the “same” for all the iterations of dataloaders irrespective of train_dl or val_dl?

I don’t know if this is from posting here, but you would want the model.eval() to happen outside the training loop, right? Also, you would want the validation to happen on the validation loop, probably.

torch.no_grad() saves memory and, sometimes, time, but the recorded autograd graphs should be deallocated when the validation leaves loss_for_batch because all tensors (Y_prob and loss in particular) go out of scope regardless of whether you have done the backward, so at the end of each iteration you will be in the same situation with or without no_grad in the loss_for_batch function. There used to be bugs around this a longish time ago, but I think the above would work.

That said, I would probably add the no_grad context (or a torch.set_grad_enabled(torch.is_grad_enabled() and optimizer is not None) context if you want to handle with gradient and without in one bit of code).

Best regards

Thomas

The model.eval() should have been outside the training loop. I wrote that erroneously.

Also, thanks for the clarification.

I am adding the torch.no_grad() in the training loop just after the model.eval() part. I guess that won’t raise an alarm.