Batch norm behavior during test


I found that if I use model.eval() during test. I get much worst result than using model.train()
I think this may due to the running average of mean and std. The question i want to ask is: if I use model.train() and do a bunch of inference, is there any other variable change other than the running mean and std? Will the model deteriorate gradually? Thank you!

FYI, latest pytorch’s batch norm modules have an option called track_running_stats.

If track_running_stats is set to False , this layer then does not keep running estimates, and batch statistics are instead used during evaluation time as well.

thank you for your response!
if i set track_running_stats = False, does that mean I am expected to get exactly the same behavior during train and test?

1 Like

Yes, mean and variance are calculated by input minibatch.