Random number generation speed

Which is the fastest way to generate random numbers on gpus?

inplace sampling methods, e.g. torch.cuda.*Tensor(*sizes).normal_(mu, sigma)

doc: http://pytorch.org/docs/master/torch.html#in-place-random-sampling

I was curious so I benchmarked it:

In [5]: def generate1():
   ...:     torch.randn(100, 100).cuda()
   ...:     torch.cuda.synchronize()
   ...:
   ...: def generate2():
   ...:     x = torch.cuda.FloatTensor(100, 100)
   ...:     torch.randn(100, 100, out=x)
   ...:     torch.cuda.synchronize()
   ...:
   ...: def generate3():
   ...:     torch.cuda.FloatTensor(100, 100).normal_(0, 1)
   ...:     torch.cuda.synchronize()
   ...:
   ...: %timeit generate1()
   ...: %timeit generate2()
   ...: %timeit generate3()
   ...:
275 µs ± 2.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
25.2 µs ± 3.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
19.7 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)

And yes, Simon’s method looks to be the fastest of these.

3 Likes