Below is a code snippet that I am implementing:-
def loss_fn(Y_prob, Y_true):
criterion = nn.BCELoss()
loss = criterion(Y_prob, Y_true)
return loss
def loss_for_batch(model, X, Y_true, optimizer=None):
Y_prob = model(X)
loss = loss_fn(Y_prob, Y_true)
if optimizer is not None:
loss.backward()
optimizer.step()
optimizer.zero_grad()
return (loss.item())*X.shape[0])
def train():
.
.
.
model.train()
for X_train, Y_train in train_dl:
train_loss_batch = loss_for_batch(
model, X_train, Y_train, optimizer)
.
.
.
model.eval()
for X_val, Y_valin train_dl:
val_loss_batch = loss_for_batch(
model, X_train, Y_train, optimizer)
My question is:-
Since I haven’t used torch.no_grad()
for backward calculation and I am aware of the fact that I should do it considering the computational constraints. But, even if I am not doing it, will it affect my model performance?
My doubt is when I am iterating through the val_dl
, the computation graph which is being made while iterating through the val_dl
will have the loss
variable too, which is having requires_grad
set to True
because I haven’t used torch.no_grad()
, will this graph affect the “next” graph and gradients which will be created on the next iteration through train_dl
and backpropagated through loss.backward
?
Is the loss
tensor being created is the “same” for all the iterations of dataloaders
irrespective of train_dl
or val_dl
?