Hello.
When I do following, o1 and o3 are different (Isn’t it should be the same values?)
- Model contains only BatchNorm2d
- If model is not updated in train mode (like below, no optimizer step), I think o1 and o3 should be same.
- o2 can be different from o1/o3 due to BatchNorm2d
- When is it possible o1 and o3 different ? (I think I did something wrong, but I can’t still figure it out)
Thanks!
x = torch.randn(1,3,224,224)
model = Model()
model.eval()
o1 = model(x)
model.train()
o2 = model(x)
model.eval()
o3 = model(x)
The outputs are computes as:
-
o1
will be computed using the running stats from batchnorm layers to normalize the corresponding input activations.
-
o2
will be computed using the activation stats in batchnorm layers to normalize the input activations. The running stats of batchnorm layers will be updated.
-
o3
will use the same approach as o1
, but with the updated running stats from the previous forward pass.
Let me know, if this clarifies the use case.
1 Like
Yes, I think you are right.
I didn’t catch the batch norm running stats would be updated at forward pass.
So, o1 and o3 surely different due to updated batchnorm statistics at o2.
Thanks!