Batchnorm.eval() cause worst result

I have sequential model with several convolutions and batchnorms. After training I save it and load in other place. Now if I load and feed my model I get good results (same loss that I have after training), but if after loading I call model.eval() I get much worse losses. Before that I think eval() should change calculation result.


Do your datasets come from different domains? Sometimes it helps to keep the model in train, pass some validation data through it to update the BatchNorm layers without backpropagating, and then switch to eval. This is used in some GANs.

However, you should be careful about the error estimate, since the validation data is not completely “clean” anymore in my opinion.

How large is the difference between the training and validation error?


I use same data when run with eval() and without it. Code look something like this. And my model trained to reproduce same value that it get as input.

model = create_model()
# model.eval() #<- uncomment for bad loss
loss_function = torch.nn.MSELoss()
logPdiff_var = Variable(torch.from_numpy(logPdiff)).type(dtype)
proxi = model.forward(logPdiff_var)
loss =  loss_function(proxi, logPdiff_var)

I try to run model with data and after this switch to eval() - doesn’t help
Without eval loss is 0.003 with it - 1.43 (this is loss at training start)

Small question: by default batchnorms in train mode? Or I should always call train() before training?

That’s a strange behavior. Could you explain a bit more about your model?
Have you changed the momentum in BatchNorm?
Are you preprocessing the data differently?

By default, Modules are initialized with training=True (code).

Also, what was your training batch size?

a = [Sequential(convolution(), batchnorm(),activation_f())] * n
b = [Sequential(Upsample(), batchnorm(), convolution(), batchnorm(), activation_f(), convolution(), batchnorm(), activation_f())] * n
model = Sequential(*a, *b)



I try with only one batch

If your batch size is only one, I doubt BatchNorm will yield good results. Could you try InstanceNorm instead?

BatchNorm relies on large batch size to get more accurate running stats estimates.

With InstanceNorm I get good results with and without eval()

1 Like

After I add ten batches in training, eval start work as expected, but I don’t understand why. Previously I think batchnorm should work on any batches count.

Theoretically maybe true. But the smaller the batch size you have, the more unstable the running average estimators are. So in practice people use large batch sizes like 64 or larger.

1 Like

Generally, BatchNorm sizes shouldn’t be smaller than 32 to get good results. Maybe see the recent GroupNorm paper by Wu & He which references this issue. In the paper itself, I think they got also good results with batchsize 16 in batchnorm, but 32 would be the rule-of-thumb recommended minimum.