# Batch Normalization disambiguation

Hi everybody,
I’m really confused about Batch Normalization’s behaviour in pytorch.
Looking to theory, BN should calculate mean and variance of features in batch samples all together, for each channel.

OK, so if I have a matrix as input value (just like an image) I have 3 options :

• BatchNorm2d → as my data is 4 dimensional (N, C, H, W)
• BatchNorm1d → by flattening data (N, C, H, W) → (N, C, L)
• BatchNorm1d → if my data has onlyone channel (am i right?) i can simply change (N, C, L) → (N, L)

Then i would expect that every BN gives me the same output. But that’s not right. Every output is different.
I’m really really really confused about this! Especially in the difference between BatchNorm1d with input data in shape (N, C, L) and (N, L). Am i right by saying that the shape (N, L) is for one-channeled data?
Thanks for help!

I’m putting here my code (really simple, have a look) :

``````import torch
import torch.nn as nn

class BN2D(nn.Module) :
def __init__(self) :
super(BN2D, self).__init__()
# nn for mnist
# input (10, 1, 2, 2)
self.bn = nn.BatchNorm2d(1)

def forward(self, x) :
#flatten data
x = self.bn(x)
x = x.view(x.size(0), -1)

return x

class BN1D(nn.Module) :
def __init__(self) :
super(BN1D, self).__init__()
# nn for mnist
# input (10, 1, 4)
self.bn = nn.BatchNorm1d(1)
# input (10, 4)
self.bn_1 = nn.BatchNorm1d(4)

def forward(self, x) :
#flatten data
y = x.view(x.size(0), 1, -1)
y = self.bn(y)
y = y.view(y.size(0), -1)

y_1 = x.view(x.size(0), -1)
y_1 = self.bn_1(y_1)

return y, y_1

def main() :
bn1d = BN1D()
print(bn1d)
bn2d = BN2D()
print(bn2d)

x = torch.randn(10, 1, 2, 2)

out1d, out1d_1 = bn1d(x)
out2d = bn2d(x)

print(out1d)
print(out1d_1)
print(out2d)

if __name__ == '__main__':
main()
``````

No, in this case you would use `L` channels.

However, the other approache (`nn.BatchNorm2d` vs. `nn.BatchNorm1d`) should yield the same result. Since you are using the affine batchnorm transformation, you would have to make sure the `weight` parameter is set to equal values (`bias` should be all zeros in both cases anyway).

``````N, C, H, W = 10, 3, 24, 24
x = torch.randn(N, C, H, W)

bn2d = nn.BatchNorm2d(3)
bn1d = nn.BatchNorm1d(3)

bn2d.weight = bn1d.weight
bn2d.bias = bn1d.bias

output2d = bn2d(x)
output1d = bn1d(x.view(N, C, -1))
print((output2d.view(N, C, -1) == output1d).all())
> tensor(1, dtype=torch.uint8)
``````

Alternatively, you could set `affine=False` and might skip the parameter assignment.

For completeness: this PR should change the initialization of the affine parameters, such that `weight` will be initialized with ones.

1 Like

Thank you very much!!