Which is the fastest way to generate random numbers on gpus?
inplace sampling methods, e.g. torch.cuda.*Tensor(*sizes).normal_(mu, sigma)
doc: http://pytorch.org/docs/master/torch.html#in-place-random-sampling
I was curious so I benchmarked it:
In [5]: def generate1():
...: torch.randn(100, 100).cuda()
...: torch.cuda.synchronize()
...:
...: def generate2():
...: x = torch.cuda.FloatTensor(100, 100)
...: torch.randn(100, 100, out=x)
...: torch.cuda.synchronize()
...:
...: def generate3():
...: torch.cuda.FloatTensor(100, 100).normal_(0, 1)
...: torch.cuda.synchronize()
...:
...: %timeit generate1()
...: %timeit generate2()
...: %timeit generate3()
...:
275 µs ± 2.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
25.2 µs ± 3.28 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
19.7 µs ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
And yes, Simon’s method looks to be the fastest of these.
3 Likes