How does pytorch’s batch norm know if the forward pass its doing is for inference or training?

I see.

As far as I know, the flag self.training just make a difference in the Dropout and Batch Norm layers,
so if you don’t use these two types of layer, it may don’t have influence.