Poor performance with eval() and batch normalization

Hi,

I have an simple dataset where the input is 55 floating point features and each sample has a classification output as 0 or 1. The dataset is split roughly equally in terms of how many samples of each class there are.

I can get good performance during training, but when using eval() my model will only output zeros.

The model is as follows:

model = nn.Sequential(
nn.Linear(55,128),
nn.BatchNorm1d(128),
nn.ReLU(),
nn.Linear(128,64),
nn.BatchNorm1d(64),
nn.ReLU(),
nn.Linear(64,8),
nn.BatchNorm1d(8),
nn.ReLU(),
nn.Linear(8,1),
nn.Sigmoid()
)

I have tried setting track_running_stats to False for the bn layers, and when using eval() the model outputs actual values, but they are all roughly between 0.2 & 0.3 so the classification as 0 or 1 is still poor.

Is there something I am fundamentally misunderstanding? I should add that my batch size is 128.

Is there some distribution shift between your eval and train data setups (e.g., perhaps due to normalization differences)? You can inspect whether the statistics of each batch are varying at training vs. eval time by just computing the mean and variance manually in the forward pass.