Should not the following code be y1==y2
?
x = torch.rand(64, 256)
model = nn.LayerNorm(256, elementwise_affine = False)
y1 = model(x)
mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True)
y2 = (x-mean)/torch.sqrt(var+model.eps)
Should not the following code be y1==y2
?
x = torch.rand(64, 256)
model = nn.LayerNorm(256, elementwise_affine = False)
y1 = model(x)
mean = x.mean(-1, keepdim = True)
var = x.var(-1, keepdim = True)
y2 = (x-mean)/torch.sqrt(var+model.eps)
So the answer is:
...
var = torch.mean((x-mean)**2, -1, keepdim = True)
...
I got the variance wrong.