How Does model.eval() Affect Gradient Descent in PyTorch and How to Handle Frequent Evaluations?

I used the Adam optimizer. If I perform an evaluation (i.e., model.train(), ...train steps... grad steps... model.eval()... model.train() in a loop) after every iteration (instead of every epoch) or after every training step when epoch length == 1, the gradients fail to descend properly. However, if I perform evaluation after every >100 iterations or increase the epoch length to a comparable scale, it works fine. It seems that model.eval() somehow affects the optimizer’s functionality.

My first question is: by what mechanism does model.eval() cause the gradients to fail to descend properly?

My second question is: if I must perform evaluation after every iteration—for instance, because I am using PyTorch for compressed sensing reconstruction and want to print the results after each iteration—what is the best way to handle this?

There’s a bit about model.eval() you can read about here Autograd mechanics — PyTorch 2.5 documentation

I would expect .just enabling eval() to not affect training, e.g. when you aren’t calling .backward() and .step() when you performed forward with .eval().


This is my code. “accumulation_steps” is always value 1. And if I remove “if not model.training”, it is also easy to get hard to autograd.
While every time the model switching to eval mode, it switches back to train mode at the next training step, maybe train() did something?

What happens if you change validation_step to still accept a batch, but do a no-op?