Layernorm doesn't result in std=0 per data point

layernorm doesn’t seem to calculate what it should. I compared the results of nn.LayerNorm and a manual calculation:

code:

import torch
import torch.nn as nn

batch = torch.tensor([[4, 3, 1],
                      [0, 2, 0]]).float()

layernorm = nn.LayerNorm(normalized_shape=3, eps=0, elementwise_affine=False)

mean0 = torch.mean(batch[0])
var0 = torch.var(batch[0])
result = (batch[0] - mean0) / torch.sqrt(var0)

for layernorm:

tensor([[ 1.0690,  0.2673, -1.3363],
        [-0.7071,  1.4142, -0.7071]])
std: tensor(1.2247)

for my manual version:

tensor([ 0.8729,  0.2182, -1.0911])
std: tensor(1.)

In your manual approach you are using the unbiased variance.
From the docs:

The standard-deviation is calculated via the biased estimator, equivalent to torch.var(input, unbiased=False)

I also get the same results using the nn.LayerNorm module and a manual approach:

batch = torch.tensor([[4, 3, 1],
                      [0, 2, 0]]).float()

layernorm = nn.LayerNorm(normalized_shape=3, eps=0, elementwise_affine=False)

out = layernorm(batch)

mean0 = torch.mean(batch, dim=-1)
var0 = torch.var(batch, unbiased=False, dim=-1)
result = (batch - mean0.unsqueeze(1)) / torch.sqrt(var0.unsqueeze(1))

print(out)
# tensor([[ 1.0690,  0.2673, -1.3363],
#        [-0.7071,  1.4142, -0.7071]], dtype=torch.float32)

print(result)
# tensor([[ 1.0690,  0.2673, -1.3363],
#         [-0.7071,  1.4142, -0.7071]], dtype=torch.float32)

print(out.std())
# tensor(1.0954, dtype=torch.float32)
print(out[0].std())
# tensor(1.2247, dtype=torch.float32)


# single sample
mean0 = torch.mean(batch[0])
var0 = torch.var(batch[0], unbiased=False)
result = (batch[0] - mean0) / torch.sqrt(var0)

print(result)
# tensor([ 1.0690,  0.2673, -1.3363], dtype=torch.float32)

print(result.std())
#tensor(1.2247, dtype=torch.float32)
1 Like