I don’t know what nn.LayerNorm is doing:
x=to.arange(4).float()
l=nn.LayerNorm(4, elementwise_affine=True)
# I expected the below value = 1
(l(x)*x.std(-1)/(x-x.mean(-1, keepdim=True)))[0]
But this code generates:
1.1547
I think it is:
np.sqrt(4)/np.sqrt(3) = 1.1547005383792517
Why the LayerNorm multiply this prefactor??