Below is a code snippet that I am implementing:-

```
def loss_fn(Y_prob, Y_true):
criterion = nn.BCELoss()
loss = criterion(Y_prob, Y_true)
return loss
```

```
def loss_for_batch(model, X, Y_true, optimizer=None):
Y_prob = model(X)
loss = loss_fn(Y_prob, Y_true)
if optimizer is not None:
loss.backward()
optimizer.step()
optimizer.zero_grad()
return (loss.item())*X.shape[0])
```

```
def train():
.
.
.
model.train()
for X_train, Y_train in train_dl:
train_loss_batch = loss_for_batch(
model, X_train, Y_train, optimizer)
.
.
.
model.eval()
for X_val, Y_valin train_dl:
val_loss_batch = loss_for_batch(
model, X_train, Y_train, optimizer)
```

My question is:-

Since I haven’t used `torch.no_grad()`

for backward calculation and I am aware of the fact that I should do it considering the computational constraints. But, even if I am not doing it, will it affect my model performance?

My doubt is when I am iterating through the `val_dl`

, the computation graph which is being made while iterating through the `val_dl`

will have the `loss`

variable too, which is having `requires_grad`

set to `True`

because I haven’t used `torch.no_grad()`

, will this graph affect the “next” graph and gradients which will be created on the next iteration through `train_dl`

and backpropagated through `loss.backward`

?

Is the `loss`

tensor being created is the “same” for all the iterations of `dataloaders`

irrespective of `train_dl`

or `val_dl`

?