Issues with batch norm

Hello! I am having some issues on using batch norm. I am in the beginning of building my NN so for now I am using 100 samples for training and I want to overfit to it, just to make sure that the network can learn. The input and output are bout 1500 each. Here is my network:

class GW_NN(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(n_inp, 2000)
        self.linear2 = nn.Linear(2000, 2000)
        self.linear3 = nn.Linear(2000, 2000)
        self.linear4 = nn.Linear(2000, 2000)
        self.linear5 = nn.Linear(2000,n_out)
        self.bn = nn.BatchNorm1d(2000)
            
    def forward(self, x):
        x = F.softplus(self.bn(self.linear1(x)))
        x = F.softplus(self.bn(self.linear2(x)))
        x = F.softplus(self.bn(self.linear3(x)))
        x = F.softplus(self.bn(self.linear4(x)))
        x = self.linear5(x) 
        return x

model_gw = GW_NN().cuda()

lrs = 1e-2
optimizer_gw = optim.Adam(model_gw.parameters(), lr = lrs)

for epoch in range(10001):
    model_gw.train()
    for i, dtt in enumerate(my_dataloader):
        optimizer_gw.zero_grad()

        inp = dtt[0].float().cuda()
        output = dtt[1].float().cuda()

        loss = F.mse_loss(model_gw(inp),output)

        loss.backward()
        optimizer_gw.step()

    if epoch%100==0:
        print(loss.data.cpu().numpy())

The loss goes down okish:

29418.57
20.279129
11.549426
8.563468
8.235117
8.161551
9.561671
7.5749683
7.60303
7.609553
7.265949
7.824227
10.810941
7.803124
7.6215243
7.977992
7.9355087
7.574047
7.326716

But when I want to try the trained NN (on the same data used for training) it fails:

idx = 10
y_real = output_data_phi[idx].data.cpu().numpy()
model_gw.eval()
y_pred = model_gw(input_data)[idx].data.cpu().numpy()

print(((y_pred-y_real)**2).mean())

I am getting 169114.33. I assume that the problem is with using batch norm in eval mode, but I am not sure what to do. Can someone help me? Thank you!

You are currently reusing the batchnorm layer in each step, which might throw the running stats as well as the trainable parameters off.
Could you try to use a separate layer and run the code again?

Thank you for your reply. Do you mean instead of repeating self.bn 4 times to use self.bn1, self.bn2, self.bn3 self.bn4, or something like this (different for each layer)?

Yes, I meant the creation of 4 different layers as self.bn1, self.bn2 etc. :wink:

Awesome, it’s working! Thank you!