nn.BatchNorm2d with shared weights

Yes, concatenating the inputs allows you to reuse the same layer, but you cannot expect to see the same results as previously explained.
Normalizing the “full” image in [H, W] will not yield the same result as normalizing 4 patches of the image in [H//4, W//4].
Here is a small artificial example which shows the completely different results:

# setup
x1 = torch.zeros(1, 1, 24, 24)
x2 = torch.ones(1, 1, 24, 24)
x = torch.cat((x1, x2), dim=2)

# full image
bn = nn.BatchNorm2d(1)

out_all = bn(x)
plt.imshow(out_all[0, 0].detach().numpy())
print(out_all.min(), out_all.max(), out_all.mean())
# tensor(-1.0000, grad_fn=<MinBackward1>) tensor(1.0000, grad_fn=<MaxBackward1>) tensor(0., grad_fn=<MeanBackward0>)

print(bn.running_mean)
# tensor([0.0500])
print(bn.running_var)
# tensor([0.9250])

# window approach
bn = nn.BatchNorm2d(1)
out = torch.cat([bn(x_) for x_ in x.split(24, dim=2)], dim=2)
plt.imshow(out[0, 0].detach().numpy())
print(out.min(), out.max(), out.mean())
# tensor(0., grad_fn=<MinBackward1>) tensor(0., grad_fn=<MaxBackward1>) tensor(0., grad_fn=<MeanBackward0>)

print(bn.running_mean)
# tensor([0.1000])
print(bn.running_var)
# tensor([0.8100])