BatchNorm performed unexpectedly

Hi, I’ve tried to train a CNN after merged the BatchNorm layers into Convolution layers. After merge bn, the BactchNorm layer is remained and the parameters are reset to make the output same as the input, the code is as below:

           child.eval()
           child.running_mean = child.running_mean.new_full(
               child.running_mean.shape, 0)
           child.running_var = child.running_var.new_full(
               child.running_var.shape, 1)
           if child.weight is not None:
               child.weight.data = child.weight.data.new_full(
                   child.weight.shape, 1)
           if child.bias is not None:
               child.bias.data = child.bias.data.new_full(child.bias.shape, 0)
           child.track_running_stats = False
           child.momentum = 0
           child.reset_running_stats()
           child.eps = 0

“child” is referred to a BatchNorm layer. But unfortunately, the BatchNorm layer still have output different from the input after the reset. It’s weird.
The torch version is 1.6.0, CUDA10.2, run in the docker.

How large is the difference?
Note that you might be running into the limited numerical precision if the max. abs difference is approx. 1e-6.
This code snippet shows that resetting the parameters as well as buffers, and calling eval() should work:

bn = nn.BatchNorm2d(3)
x = torch.randn(2, 3, 24, 24)

# change bn stats
for _ in range(10):
    out = bn(x)

# make sure output is not equal to input
print((out != x).all())
> tensor(True)

# reset
bn.reset_parameters()
bn.reset_running_stats()

print(dict(bn.named_parameters()))
> {'weight': Parameter containing:
tensor([1., 1., 1.], requires_grad=True), 'bias': Parameter containing:
tensor([0., 0., 0.], requires_grad=True)}

print(dict(bn.named_buffers()))
> {'running_mean': tensor([0., 0., 0.]), 'running_var': tensor([1., 1., 1.]), 'num_batches_tracked': tensor(0)}

bn.eval()
# test again
for _ in range(10):
    out = bn(x)

# check for equal output
print(torch.allclose(out, x)) # use allclose for numerical precision
> True

Thanks for reply! After checking my code, I find “model.train()” have been called unexpected in the code script. This should be the reason for the unexpected performance.