As @SimonW replied in that thread, it’s not a bug. It’s how batch norm works. In training phase the mean/std are learned from the training data, and loaded in the evaluation phase.
If you turn track_running_stats off (as suggested in the post) you will instead use the mean and std of the batch in eval mode. This is flawed and incorrect usage, since you will get an inference result which is based on the data in your batch.
As an example, if you ran one image during inference - it would get a different value (much worse) than if you ran it with a batch of other images. If you change the batchsize during eval with tracking off you will see what I mean.
2 Likes