I have been wrangling with pytorch the last few weeks and I have noticed a potentially big difference in behavior of the BatchNorm1d function between pytorch v0.4.1 and v1.0.0. I wanted to ask you before start digging in the code because maybe somebody knows about some changes in the code. I have reviewed the release notes and I didn’t find anything important related to it.
On the one hand I have observed that in version 1.0.0 the model does not work well (neither with training data) when you run model.eval()
, but it works like a charm in model.train()
mode. On the other hand, using version 0.4.1, the model works almost identically when switching between model.eval()
and model.train()
modes.
I am using nn.BatchNorm1d
with a momentum of 0.995
, which for my data seems to work well in version 0.4.1, while the default value of 0.1 produced the effect described in the paragraph before (which is understandable). I have tried to wiggle this parameter and even use it inversely (i.e. 1-momentum) just in case it is programmed that way, but no success in version 1.0.0
.
As I am implementing a quite complex model I cannot provide a handful reproducible code, but I suspect that hopefully this behavior will sound familiar to somebody here . Do you have any clue?
Thanks in advance!