# Understanding torch.nn.LayerNorm in nlp

I’m trying to understanding how `torch.nn.LayerNorm` works in a nlp model. Asuming the input data is a batch of sequence of word embeddings:

``````batch_size, seq_size, dim = 2, 3, 4
embedding = torch.randn(batch_size, seq_size, dim)
print("x: ", embedding)

layer_norm = torch.nn.LayerNorm(dim)
print("y: ", layer_norm(embedding))

# outputs:
"""
x:  tensor([[[ 0.5909,  0.1326,  0.8100,  0.7631],
[ 0.5831, -1.7923, -0.1453, -0.6882],
[ 1.1280,  1.6121, -1.2383,  0.2150]],

[[-0.2128, -0.5246, -0.0511,  0.2798],
[ 0.8254,  1.2262, -0.0252, -1.9972],
[-0.6092, -0.4709, -0.8038, -1.2711]]])
y:  tensor([[[ 0.0626, -1.6495,  0.8810,  0.7060],
[ 1.2621, -1.4789,  0.4216, -0.2048],
[ 0.6437,  1.0897, -1.5360, -0.1973]],

[[-0.2950, -1.3698,  0.2621,  1.4027],
[ 0.6585,  0.9811, -0.0262, -1.6134],
[ 0.5934,  1.0505, -0.0497, -1.5942]]],
"""
``````

From the document’s description, my understanding is that the mean and std are computed by all embedding values per sample. So I try to compute `y[0, 0, :]` manually:

``````mean = torch.mean(embedding[0, :, :])
std = torch.std(embedding[0, :, :])
print((embedding[0, 0, :] - mean) / std)
``````

which gives `tensor([ 0.4310, -0.0319, 0.6523, 0.6050])` and that’s not the right output. I want to know what is the right way to compute `y[0, 0, :]` ?

This should work:

``````batch_size, seq_size, dim = 2, 3, 4
embedding = torch.randn(batch_size, seq_size, dim)
print("x: ", embedding)

layer_norm = torch.nn.LayerNorm(dim)
y = layer_norm(embedding)
print("y: ", y)

out = (embedding - torch.mean(embedding, dim=2, keepdims=True)) / torch.sqrt(torch.var(embedding, dim=2, keepdims=True, unbiased=False) + layer_norm.eps)

print((out - y).abs().max())
I found the problem. I need to add `unbiased=False` in `torch.std(embedding[0, :, :])`