Consider the following inputs:
a = torch.tensor([-3.], dtype=torch.float64)
weight = torch.tensor([1e10], dtype=torch.float64)
bias = torch.tensor([-1], dtype=torch.float64)
normalaxis = (1,)
eps = 0.1
torch.nn.functional.layer_norm(a, normalaxis, weight= weight, bias=bias, eps=eps)
This returns -1. as expected. In fact, when the weight is between 1e0
to 1e10
, the return value is -1. However, when the weight is further increased from 1e10
, the return value changes (first decreases and then increases). This makes no sense to me since the mean in every case is -3 and therefore, the weight will always get multiplied by 0 according to the layer_norm update equation which means the answer should always be -1 (equal to bias). What am I missing here?