Model.eval() gives incorrect loss for model with batchnorm layers

@falmasri I wrote above in the comment here: Model.eval() gives incorrect loss for model with batchnorm layers with a working answer.

It’s not a problem in the sense that it’s not a software bug.

It’s a problem in the sense that if you have a non-stationary training, you will see this behavior unless you adjust your momentum term of the BatchNorm. We set the momentum to 0.1 because for most workloads that we use it was sufficient. Play around with it.