Calling model.train()

Hi, I was confused about whether it is necessary to call model.train() after loading a model if I call optimizer.zero_grad() and optimizer.step() for each batch? I wanted to understand the functionality of model.train().

The role of model.train() and optimizer.zero_grad(), optimizer.step() are a bit different. Some layers behave differently during training vs testing the model. For example dropout and batchnorm layers. That’s why we are careful using .train(), .eval(), as the model performance could worsen considerably if we don’t.

This is a bit different than optimizer.step() which does a gradient descent step based on the gradients computed during the backpropogation for the specific minibatch. After you’ve done optimizer.step() you don’t want these to impact the gradients for the next batch so you reset them by doing optimizer.zero_grad().

1 Like

thanks for clarifying!