BatchNorm1d output different with my implementation

Xuyang_Bai · July 29, 2019, 4:03am

Hi, I want to implement BatchNorm1d, but the result is always a little bit different from the output of pytorch.

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

# BatchNorm1d:
# The mean and standard-deviation are calculated per-dimension over the mini-batches. 
# Also by default, during training this layer keeps running estimates of its computed mean and variance, 
# which are then used for normalization during evaluation. The running estimates are kept with a default momentum of 0.1.
conv1 = nn.Conv1d(4, 16, 1)
bn1 = nn.BatchNorm1d(16, eps=1e-5, momentum=1)
x = torch.rand(32, 4, 1)


input = conv1(x)
print("Input to batch norm:", input.shape)
after_norm = bn1(input)
print("Output of bn1:", after_norm[0].squeeze())

Ex = input.mean(dim=[0, 2], keepdim=True)
Varx = torch.sqrt(input.var(dim=[0, 2],  keepdim=True)) 
# print(Ex.shape)
after_norm2 = ((input - Ex) / Varx + bn1.eps) * bn1.weight.unsqueeze(0).unsqueeze(-1) + bn1.bias.unsqueeze(0).unsqueeze(-1)
print("Manually output:", after_norm2[0].squeeze())

# test running mean
print("*" * 80)
print("Test Running Mean:")
print(input.mean(dim=(0, 2)))
print(bn1.running_mean)
print("Test Running Variance:")
print(input.var(dim=(0, 2)))
print(bn1.running_var)

The output is:

Input to batch norm: torch.Size([32, 16, 1])
Output of bn1: tensor([ 5.1425e-02, -1.1029e+00,  1.0990e+00, -7.5205e-04,  9.9263e-01,
         1.1077e-01, -2.0028e-01, -2.5355e-01,  3.5257e-01, -2.5200e-01,
         2.7035e-01, -1.3284e+00, -1.4589e+00,  4.0900e-02,  5.6800e-03,
         1.9910e-01], grad_fn=<SqueezeBackward0>)
torch.Size([1, 16, 1])
Manually output: tensor([ 5.0625e-02, -1.0859e+00,  1.0833e+00, -7.4031e-04,  9.7721e-01,
         1.0905e-01, -1.9715e-01, -2.4961e-01,  3.4707e-01, -2.4808e-01,
         2.6616e-01, -1.3077e+00, -1.4362e+00,  4.0275e-02,  5.5915e-03,
         1.9602e-01], grad_fn=<SqueezeBackward0>)
********************************************************************************
Test Running Mean:
tensor([ 0.0606,  0.0907, -0.0440, -0.3544,  0.2975,  0.8878, -0.3703, -0.2941,
        -0.4312,  0.7303,  0.1518,  0.4278,  0.3618, -0.2783, -0.3981, -0.1884],
       grad_fn=<MeanBackward2>)
tensor([ 0.0606,  0.0907, -0.0440, -0.3544,  0.2975,  0.8878, -0.3703, -0.2941,
        -0.4312,  0.7303,  0.1518,  0.4278,  0.3618, -0.2783, -0.3981, -0.1884])
Test Running Variance:
tensor([0.0282, 0.0136, 0.0035, 0.0318, 0.0247, 0.0329, 0.0380, 0.0204, 0.0463,
        0.0267, 0.0224, 0.0332, 0.0319, 0.0167, 0.0294, 0.0222],
       grad_fn=<VarBackward1>)
tensor([0.0282, 0.0136, 0.0035, 0.0318, 0.0247, 0.0329, 0.0380, 0.0204, 0.0463,
        0.0267, 0.0224, 0.0332, 0.0319, 0.0167, 0.0294, 0.0222])

ptrblck · July 29, 2019, 9:10am

Have a look at this example, where I’ve implemented the batch norm calculations manually and compare it to your code.

Xuyang_Bai · July 29, 2019, 9:44am

Hi @ptrblck, thanks for your reply. I have looked at your implementation and found that I should calculate the mean and variance along the first and third axis, but after I change it, the result is still different.

Ex = input.mean(dim=[0, 2], keepdim=True)
Varx = torch.sqrt(input.var(dim=[0, 2],  keepdim=True))

And setting unbiased=False also does not work.

And I have check the running mean and running var, it is the same. ( I have update the question above) Could you please take a look at my code?

Thanks a lot