related: How does pytorch’s batch norm know if the forward pass its doing is for inference or training?