Numerical stability of Normal Torch torch.randn

uumami · February 9, 2024, 1:42am

So I’ve been testing torch.rand and I am not sure if I am doing something wrong. But the routine seems to suffer from some numerical instability. Here is a notebook for replicating my results in colab: Google Colab
I am using a T4-GPU

import torch

h = torch.randn(100000000, dtype=torch.float64, device='cuda')
eps =  torch.finfo(torch.float64).eps
print(f'Eps number',eps)
print(f'Mean {h.mean(dtype=torch.float64)}')
print(f'Std {h.std()}')

Output:

Eps number 2.220446049250313e-16
Mean 3.946074394742064e-05
Std 0.9999776430693459

Though the mean is close to zero, it is farway from the reported eps. It’s around a factor of 3 bigger than the eps. I ran some bootstrap simulations and it seems to not be biased. Ran thousand simulations of one million points.

Mean:

Std:

I also computed the quantiles, and you can see 99% of the simulations is far from zero at least 3 times the eps value.

KFrank · February 10, 2024, 4:58pm

Hi Uumami!

This is to be expected

The mean of a bunch of (independent, identically-distributed) random
numbers has standard deviation sigma / sqrt (n), where sigma is
the standard deviation of the individual samples and n is the number of
samples over which you take the mean.*

randn() generates samples that have standard deviation 1.0 and since
you take the mean of 1.e8 samples, the standard deviation of your mean
will be 1.e-4. This means that you don’t expect your mean to be “very
close” to zero, but, rather, “about” 1.e-4 away from zero.

So the mean you calculate of 3.946e-5 is a perfectly expected value.

*) This is discussed many places: One reference would be Wikipedia’s
Bienaymé’s identity. (Don’t let the generality of this reference put you off;
the simple version we are using is a well-known result from elementary
statistics.)

Best.

K. Frank