# Layernorm doesn't result in std=0 per data point

layernorm doesn’t seem to calculate what it should. I compared the results of nn.LayerNorm and a manual calculation:

code:

``````import torch
import torch.nn as nn

batch = torch.tensor([[4, 3, 1],
[0, 2, 0]]).float()

layernorm = nn.LayerNorm(normalized_shape=3, eps=0, elementwise_affine=False)

mean0 = torch.mean(batch)
var0 = torch.var(batch)
result = (batch - mean0) / torch.sqrt(var0)
``````

for layernorm:

``````tensor([[ 1.0690,  0.2673, -1.3363],
[-0.7071,  1.4142, -0.7071]])
std: tensor(1.2247)
``````

for my manual version:

``````tensor([ 0.8729,  0.2182, -1.0911])
std: tensor(1.)
``````

In your manual approach you are using the unbiased variance.
From the docs:

The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False)

I also get the same results using the `nn.LayerNorm` module and a manual approach:

``````batch = torch.tensor([[4, 3, 1],
[0, 2, 0]]).float()

layernorm = nn.LayerNorm(normalized_shape=3, eps=0, elementwise_affine=False)

out = layernorm(batch)

mean0 = torch.mean(batch, dim=-1)
var0 = torch.var(batch, unbiased=False, dim=-1)
result = (batch - mean0.unsqueeze(1)) / torch.sqrt(var0.unsqueeze(1))

print(out)
# tensor([[ 1.0690,  0.2673, -1.3363],
#        [-0.7071,  1.4142, -0.7071]], dtype=torch.float32)

print(result)
# tensor([[ 1.0690,  0.2673, -1.3363],
#         [-0.7071,  1.4142, -0.7071]], dtype=torch.float32)

print(out.std())
# tensor(1.0954, dtype=torch.float32)
print(out.std())
# tensor(1.2247, dtype=torch.float32)

# single sample
mean0 = torch.mean(batch)
var0 = torch.var(batch, unbiased=False)
result = (batch - mean0) / torch.sqrt(var0)

print(result)
# tensor([ 1.0690,  0.2673, -1.3363], dtype=torch.float32)

print(result.std())
#tensor(1.2247, dtype=torch.float32)
``````
1 Like