Here is a question about fuse torch.nn.BatchNorm2d

MartinZhang · December 17, 2021, 6:36am

This a function about fuse torch.nn.BatchNorm2d by myself

import torch.nn as nn

class FuseBN(nn.Module):
    def __init__(self, layer):
        super().__init__()
        eps = layer.eps
        mean = layer.running_mean
        var = layer.running_var
        weight = layer.weight
        bias = layer.bias

        bias = bias - (weight*mean)/torch.sqrt(var + eps)
        weight = weight / torch.sqrt(var + eps)
        
        self.weight  = weight.reshape(1, -1, 1, 1)
        self.bias = bias.reshape(1, -1, 1, 1)
       
    def forward(self, x):
        out = self.weight * x + self.bias
        return out

When I try to compare the results of BN and FuseBN

import torch
import torch.nn as nn

data = torch.randn(1, 3, 224, 224)
bn = nn.BatchNorm2d(3)
fuse_bn = FuseBN(bn)

bn_result = bn(data)
fuse_bn_result = fuse_bn(data)

compare_value = torch.max(torch.abs(bn_result - fuse_bn_result))

In theory, the difference should be about 1e-5， but I get the compare_value is 0.143.

I don’t know, why? please help me, thanks a lot.

mMagmer · December 17, 2021, 8:00am

See batchnorm2d
when you forward data from the batch norm layer, you are changing running averages.
Use bn.eval() to freeze these buffers.

ptrblck · December 17, 2021, 8:02am

You are comparing the native batchnorm layer in training mode with your FuseBN layer, which uses the eval logic.
Also, after initializing the batchnorm layer the running mean would be all zeros and running_var all ones so you might want to train it for a few steps so that both layers would indeed normalize the data with running stats.
This should work:

data = torch.randn(1, 3, 224, 224) * 10 + 5
bn = nn.BatchNorm2d(3)

for _ in range(100):
    out = bn(data)

print(bn.running_mean)
print(bn.running_var)
bn.eval()

fuse_bn = FuseBN(bn)
bn_result = bn(data)
fuse_bn_result = fuse_bn(data)

compare_value = torch.max(torch.abs(bn_result - fuse_bn_result))
print(compare_value)
> tensor(4.7684e-07, grad_fn=<MaxBackward1>)

PS: @mMagmer was a bit faster, but just posting it for the sake of completeness with the code.

MartinZhang · December 20, 2021, 1:01pm

awsome. thanks @ptrblck @mMagmer