A question about net.eval()

wayne · April 30, 2020, 1:12am

Hi,
Could anyone help me?
After several batch training in every epoch, I tested my network on the validation set. Here is my pseudocode:

for i in range(epoch):
       #######     train   #############
      for j , batch in enumerate(train_loader):
                   pred = net(batch)
                   loss  = loss_function(pred, label)
                   loss.backward()
                   optimizer.zero_grad()
                   optimizer.step()
                    
             ######  validation  ##########
                   if j % 10 ==0:
                        net.eval()
                         for jj , batch2 in enumerate(validation_loader):
                                pred2 = net(batch2)
                                loss  = loss_function(pred2, label)

After the validation, my network parameter‘’s (e.g. nn.Parameter(torch.ones(1))) gradient becomes 0!!!

But I rewrite my pseudocode as follows:

for i in range(epoch):
       #######     train   #############
      for j , batch in enumerate(train_loader):
                   pred = net(batch)
                   loss  = loss_function(pred, label)
                   loss.backward()
                   optimizer.zero_grad()
                   optimizer.step()
                    
             ######  validation  ##########
                   if j % 10 ==0:
                        net.eval()
                         for jj , batch2 in enumerate(validation_loader):
                                pred2 = net(batch2)
                                loss  = loss_function(pred2, label)

                    **net.train()**

The only difference is that I add “net.train()” after the validation, then the gradient of nn.Parameter would not become 0.

So my question:
I should add this “net.train()” after the validation everytime, or I just did something wrong?

wayne · April 30, 2020, 4:34am

Could anyone help me?

ptrblck · April 30, 2020, 4:52am

You should call model.train() again before continuing with the training.
This will make sure to set all layers to training mode again, which would change the behavior of e.g. dropout and batchnorm layers.