Major differences in BatchNorm behavior between v0.4.1 and 1.0.0

ivallesp · February 3, 2019, 7:05am

I have been wrangling with pytorch the last few weeks and I have noticed a potentially big difference in behavior of the BatchNorm1d function between pytorch v0.4.1 and v1.0.0. I wanted to ask you before start digging in the code because maybe somebody knows about some changes in the code. I have reviewed the release notes and I didn’t find anything important related to it.

On the one hand I have observed that in version 1.0.0 the model does not work well (neither with training data) when you run model.eval(), but it works like a charm in model.train() mode. On the other hand, using version 0.4.1, the model works almost identically when switching between model.eval() and model.train() modes.

I am using nn.BatchNorm1d with a momentum of 0.995, which for my data seems to work well in version 0.4.1, while the default value of 0.1 produced the effect described in the paragraph before (which is understandable). I have tried to wiggle this parameter and even use it inversely (i.e. 1-momentum) just in case it is programmed that way, but no success in version 1.0.0.

As I am implementing a quite complex model I cannot provide a handful reproducible code, but I suspect that hopefully this behavior will sound familiar to somebody here . Do you have any clue?

Thanks in advance!