Performance highly degraded when eval() is activated in the test phase

I met the same problem but the reason is different from all above.

I train the model with half precision for saving memory, and an ounce of values of running_var (about 9w) is out of the bound of float16(6w+). As a result, some running_var becomes ‘inf’ and yields wrong results when running in eval mode. On the other hand, when running in training mode, the precision seems normal because only few batches yield inf in running_var.

Solution: Use float32 for BN layers even when training in half precision (float16), you may wish to see here for code.