Adding batchnorm increases initial loss function value and prevents network from converging

I am using the nn.Batchnorm layer on an autoencoder

    self.enc1 = nn.Linear(28 * 28, 1000)
    self.enc1bn = nn.BatchNorm1d(1000)

    self.enc2 = nn.Linear(1000, 1000)
    self.enc2bn = nn.BatchNorm1d(1000)

    self.enc3 = nn.Linear(1000, 1000)
    self.enc3bn = nn.BatchNorm1d(1000)

    self.bottleneck = nn.Linear(1000, 1000)

    self.dec1 = nn.Linear(1000, 1000)
    self.dec1bn = nn.BatchNorm1d(1000)

    self.dec2 = nn.Linear(1000, 1000)
    self.dec2bn = nn.BatchNorm1d(1000)

    self.dec3 = nn.Linear(1000, 1000)
    self.dec3bn = nn.BatchNorm1d(1000)

    self.dae_out = nn.Linear(1000, 28 * 28)

This seems to affect the training and gives a poorer reconstruction of the input, compared to when I am not using batchnorm. My layer structure is : Linear -> Batchnorm -> RELU .

Any ideas why this is ?

It’s hard to tell why, but for generative models BatchNorm can have weird side effects.