A question about net.eval()

Hi,
Could anyone help me?
After several batch training in every epoch, I tested my network on the validation set. Here is my pseudocode:

for i in range(epoch):
       #######     train   #############
      for j , batch in enumerate(train_loader):
                   pred = net(batch)
                   loss  = loss_function(pred, label)
                   loss.backward()
                   optimizer.zero_grad()
                   optimizer.step()
                    
             ######  validation  ##########
                   if j % 10 ==0:
                        net.eval()
                         for jj , batch2 in enumerate(validation_loader):
                                pred2 = net(batch2)
                                loss  = loss_function(pred2, label)
                    

After the validation, my network parameter‘’s (e.g. nn.Parameter(torch.ones(1))) gradient becomes 0!!!

But I rewrite my pseudocode as follows:

for i in range(epoch):
       #######     train   #############
      for j , batch in enumerate(train_loader):
                   pred = net(batch)
                   loss  = loss_function(pred, label)
                   loss.backward()
                   optimizer.zero_grad()
                   optimizer.step()
                    
             ######  validation  ##########
                   if j % 10 ==0:
                        net.eval()
                         for jj , batch2 in enumerate(validation_loader):
                                pred2 = net(batch2)
                                loss  = loss_function(pred2, label)

                    **net.train()**

The only difference is that I add “net.train()” after the validation, then the gradient of nn.Parameter would not become 0.

So my question:
I should add this “net.train()” after the validation everytime, or I just did something wrong?

Could anyone help me?

You should call model.train() again before continuing with the training.
This will make sure to set all layers to training mode again, which would change the behavior of e.g. dropout and batchnorm layers.