Layernorm, implementation?

gslaller · January 14, 2020, 3:03pm

Should not the following code be y1==y2?

x = torch.rand(64, 256)
model = nn.LayerNorm(256, elementwise_affine = False)

y1 = model(x)

mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True)
y2 = (x-mean)/torch.sqrt(var+model.eps)

gslaller · January 20, 2020, 9:02pm

So the answer is:

...
var = torch.mean((x-mean)**2, -1, keepdim = True)
...

I got the variance wrong.