Hey guys! Out of interest, I wanted to reimplement the nn.LayerNorm functionality but I cannot wrap my head around a dummy example; I expected the result of both ref and out to be the same. For context, the embedding is supposed to be a single sentence (batch_size = 1) with two words and each word dimension equals to two. Thank you a lot!
import torch
import torch.nn as nn
embedding = torch.FloatTensor([[[2,1],[3,4]]])
layer_norm = nn.LayerNorm([2])
ref = layer_norm(embedding)
mean = embedding.mean(dim=(1,2))
std = embedding.std(dim=(1,2))
out = (embedding - mean) / (std + 1e-5)
out, ref
=>
(tensor([[[-0.3873, -1.1619],
[ 0.3873, 1.1619]]]), tensor([[[ 1.0000, -1.0000],
[-1.0000, 1.0000]]], grad_fn=<NativeLayerNormBackward0>))