I want to optimize my model inference performance by combining the batch normalization layer into the conv2d layer during the eval phase. However, all my results are off by some vale range from 0.051 ~0.072 when compared to regular CONV+BN forward path.
My current approach is to update the CONV layer weights and bias using the pretrained model parameters from the BN layer, as show in the equations below:
 var_sqrt = torch.sqrt(BN_running_var + 1e5)
 w = w * (BN_weight / var_sqrt).reshape(dim)
 b = (b  BN_running_mean) / var_sqrt * BN_weight + BN_bias
For example, I have some sample results from the first CONV/BN layer(s):
Then:

CONVlayer + torch batchnorm2d : [0.1672, 0.1658, 0.1661, …, 0.3378, 0.3378, 0.3378]

My combined CONV_BN layer: [0.2186, 0.2173, 0.2176, …, 0.2669, 0.2669, 0.2669]
As a mathematical sanity check, I recorded the output of the first conv layer:
[1.4406e+02, 1.4436e+02, 1.4429e+02, …, 2.5261e+02, 2.5261e+02, 2.5261e+02]
And using just the first value of the model 1.4406e+02, and find related values for batch normalization:
 model['bn1.weight’] : 0.3468
 model['bn1.bias’]: 0.3069
 model['bn1.running_mean’]: 124.3252
 torch.sqrt(model['bn1.running_var’]+1e05):77.5449
I used the formula form torch batchnorm2d doc: https://pytorch.org/docs/stable/nn.html
Y = (144.06  124.3252) / 77.5449 * 0.3468  0.3069 = **0.21864 ,** which matches my batch norm implementation, but is obviously different from pytorch’s **batchnorm2D value of 0.1672.**
Note: I searched online and see some discussion. There is one related to BN difference between training and eval: https://github.com/pytorch/pytorch/issues/19902, but my case is on eval.
Really appreciate if you can help to explain the difference as the it is > 1e4.