Issues with batch norm

smu226 · December 16, 2019, 6:53am

Hello! I am having some issues on using batch norm. I am in the beginning of building my NN so for now I am using 100 samples for training and I want to overfit to it, just to make sure that the network can learn. The input and output are bout 1500 each. Here is my network:

class GW_NN(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear1 = nn.Linear(n_inp, 2000)
        self.linear2 = nn.Linear(2000, 2000)
        self.linear3 = nn.Linear(2000, 2000)
        self.linear4 = nn.Linear(2000, 2000)
        self.linear5 = nn.Linear(2000,n_out)
        self.bn = nn.BatchNorm1d(2000)
            
    def forward(self, x):
        x = F.softplus(self.bn(self.linear1(x)))
        x = F.softplus(self.bn(self.linear2(x)))
        x = F.softplus(self.bn(self.linear3(x)))
        x = F.softplus(self.bn(self.linear4(x)))
        x = self.linear5(x) 
        return x

model_gw = GW_NN().cuda()

lrs = 1e-2
optimizer_gw = optim.Adam(model_gw.parameters(), lr = lrs)

for epoch in range(10001):
    model_gw.train()
    for i, dtt in enumerate(my_dataloader):
        optimizer_gw.zero_grad()

        inp = dtt[0].float().cuda()
        output = dtt[1].float().cuda()

        loss = F.mse_loss(model_gw(inp),output)

        loss.backward()
        optimizer_gw.step()

    if epoch%100==0:
        print(loss.data.cpu().numpy())

The loss goes down okish:

But when I want to try the trained NN (on the same data used for training) it fails:

idx = 10
y_real = output_data_phi[idx].data.cpu().numpy()
model_gw.eval()
y_pred = model_gw(input_data)[idx].data.cpu().numpy()

print(((y_pred-y_real)**2).mean())

I am getting 169114.33. I assume that the problem is with using batch norm in eval mode, but I am not sure what to do. Can someone help me? Thank you!

ptrblck · December 16, 2019, 7:07am

You are currently reusing the batchnorm layer in each step, which might throw the running stats as well as the trainable parameters off.
Could you try to use a separate layer and run the code again?

smu226 · December 16, 2019, 7:20am

Thank you for your reply. Do you mean instead of repeating self.bn 4 times to use self.bn1, self.bn2, self.bn3 self.bn4, or something like this (different for each layer)?

ptrblck · December 16, 2019, 7:22am

Yes, I meant the creation of 4 different layers as self.bn1, self.bn2 etc.

smu226 · December 16, 2019, 7:50am

Awesome, it’s working! Thank you!