How does the function torch.randn generate variates? What specific method does it use, and where can I find a reference (if there is one)? Also, does the method change based on CPU/GPU usage?
As near as I can tell,
randn() uses the Box-Muller method, both on
the cpu and gpu.
I am not aware that the algorithm used is documented anywhere
but in the code. It’s also quite opaque to me how any given pytorch
python call gets dispatched down to the code that actually does the
But my best guess for the cpu is:
and for the gpu is:
The gpu code calls down into the
curand_normal() set of functions
in NVIDIA’s cuRAND library.
Quoting from the cuRAND documentation:
__device__ float2 curand_normal2 (curandState_t *state) ...
The above functions generate two normally or log normally distributed pseudorandom results with each call. Because the underlying implementation uses the Box-Muller transform, this is generally more efficient than generating a single result with each call.
As described above the cpu and gpu versions are implemented
entirely independently. It appears, however, that they both end up
using the same underlying Box-Muller algorithm (near as I can tell …).
Hi K. Frank,
Thanks a lot for the detailed answer!! Box-Muller makes sense, and thank you for finding the source code.
Now I am wondering how the uniform RVs are generated, and how they do this efficiently on GPU. I guess this is done in cuRAND also. I will dig around and try to find out.
Thanks and best regards,