Performance difference when using BatchNorm1d vs StandardScaler

I have a basic classification model:

class BNN(nn.Sequential):
    def __init__(self):
        super().__init__()
        self.add_module("Linear", nn.Linear(len(columns), 40))
        self.add_module("ReLU", nn.ReLU(inplace=True))
        self.add_module("Linear2", nn.Linear(40, 1))
        self.add_module("Sigmoid", nn.Sigmoid())

and if i train and then evaluate the model with data that I have applied the sklearn preprocessor to:

scaler = preprocessing.StandardScaler()
train_scaled = scaler.fit_transform(train[columns])
test_scaled = scaler.transform(test[columns])
evaluate_scaled = scaler.transform(evaluate[columns])

the performance is fine / great.
However if I turn the model into:

class BNN(nn.Sequential):
    def __init__(self):
        super().__init__()
        self.add_module("Norm", nn.BatchNorm1d(len(columns))) # Layer in question
        self.add_module("Linear", nn.Linear(len(columns), 40))
        self.add_module("ReLU", nn.ReLU(inplace=True))
        self.add_module("Linear2", nn.Linear(40, 1))
        self.add_module("Sigmoid", nn.Sigmoid())

and don’t preprocess the data, the performance degrades very significantly.
Is this possible because the batchedNorm only sees batches and would be expected to under perform (a bit) or am I using it wrong? Do i need to do something special because I have balanced training data (upsampled) but imbalanced test and evaluation data (1% positive class)

How would a network that just learned the BatchNorm1d layer look?

I’ve made a simple test and it confirms what I though. How do i train just a Normalisation layer?



scaler = preprocessing.StandardScaler()
sk_train_scaled = scaler.fit_transform(train[columns])
sk_test_scaled = scaler.transform(test[columns])
sk_evaluate_scaled = scaler.transform(evaluate[columns])

train_scaled = train[columns].values
test_scaled = test[columns].values
evaluate_scaled = evaluate[columns].values


x_train = torch.tensor(train_scaled, dtype=torch.float)
y_train = torch.tensor(train[['y']].values, dtype=torch.float)

x_test = torch.tensor(test_scaled, dtype=torch.float)
y_test = torch.tensor(test[['y']].values, dtype=torch.float)

x_evaluate = torch.tensor(evaluate_scaled, dtype=torch.float)
y_evaluate = torch.tensor(evaluate[['y']].values, dtype=torch.float)


# In[7]:


batch_size = int(len(x_train)/100)
dataset = data.TensorDataset(x_train.to(DEVICE), y_train.to(DEVICE))
loader = data.DataLoader(dataset, batch_size=batch_size)


# In[8]:


class TorchScaler(nn.Sequential):
      def __init__(self):
        super().__init__()
        self.add_module("Norm", nn.BatchNorm1d(len(columns)))  


# In[32]:


ts = TorchScaler()


# In[33]:


ts.train()


# In[35]:


sk_train_scaled.std(), sk_train_scaled.mean()


# In[36]:


tx = ts.forward(x_train)
tx.std(), tx.mean()


# In[37]:


ts.eval()


# In[38]:


tx2 = ts.forward(x_train)
tx2.std(),  tx2.mean()


# In[39]:


sk_test_scaled.std(), sk_test_scaled.mean()


# In[40]:


txt = ts.forward(x_test)
txt.std(),  txt.mean()


and these are the outputs: