Layernorm, implementation?

Should not the following code be y1==y2?

x = torch.rand(64, 256)
model = nn.LayerNorm(256, elementwise_affine = False)

y1 = model(x)

mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True)
y2 = (x-mean)/torch.sqrt(var+model.eps)

So the answer is:

...
var = torch.mean((x-mean)**2, -1, keepdim = True)
...

I got the variance wrong.

1 Like