nn.BatchNorm2d with shared weights

ptrblck · August 11, 2022, 4:17am

Yes, concatenating the inputs allows you to reuse the same layer, but you cannot expect to see the same results as previously explained.
Normalizing the “full” image in [H, W] will not yield the same result as normalizing 4 patches of the image in [H//4, W//4].
Here is a small artificial example which shows the completely different results:

# setup
x1 = torch.zeros(1, 1, 24, 24)
x2 = torch.ones(1, 1, 24, 24)
x = torch.cat((x1, x2), dim=2)

# full image
bn = nn.BatchNorm2d(1)

out_all = bn(x)
plt.imshow(out_all[0, 0].detach().numpy())
print(out_all.min(), out_all.max(), out_all.mean())
# tensor(-1.0000, grad_fn=<MinBackward1>) tensor(1.0000, grad_fn=<MaxBackward1>) tensor(0., grad_fn=<MeanBackward0>)

print(bn.running_mean)
# tensor([0.0500])
print(bn.running_var)
# tensor([0.9250])

# window approach
bn = nn.BatchNorm2d(1)
out = torch.cat([bn(x_) for x_ in x.split(24, dim=2)], dim=2)
plt.imshow(out[0, 0].detach().numpy())
print(out.min(), out.max(), out.mean())
# tensor(0., grad_fn=<MinBackward1>) tensor(0., grad_fn=<MaxBackward1>) tensor(0., grad_fn=<MeanBackward0>)

print(bn.running_mean)
# tensor([0.1000])
print(bn.running_var)
# tensor([0.8100])