Consider the following inputs:
a = torch.tensor([-3.], dtype=torch.float64) weight = torch.tensor([1e10], dtype=torch.float64) bias = torch.tensor([-1], dtype=torch.float64) normalaxis = (1,) eps = 0.1 torch.nn.functional.layer_norm(a, normalaxis, weight= weight, bias=bias, eps=eps)
This returns -1. as expected. In fact, when the weight is between
1e10, the return value is -1. However, when the weight is further increased from
1e10, the return value changes (first decreases and then increases). This makes no sense to me since the mean in every case is -3 and therefore, the weight will always get multiplied by 0 according to the layer_norm update equation which means the answer should always be -1 (equal to bias). What am I missing here?