Hi, all. I’m a newbie to PyTorch and have a question.
I have been trying to generate random numbers from uniform distribution and I recently found that torch.rand() works much faster than numpy.random.rand().
For example, I run a following script
setup1 = """
import numpy as np
import torch
def test1():
y = torch.FloatTensor(np.random.rand(128, 25000))
"""
setup2 = """
import torch
def test2():
y = torch.rand(128, 25000)
"""
if __name__ == "__main__":
import timeit
print(timeit.timeit("test1()", setup=setup1, number=100))
print(timeit.timeit("test2()", setup=setup2, number=100))
and then I got a following output
14.177617023233324
1.8277414003387094
It means torch.rand() is much faster than numpy.random.rand(), but I still don’t understand why it is.
I’d be very grateful if someone could tell me why.
Thanks.
alexis-jacq
(Alexis David Jacq)
February 1, 2018, 10:26am
2
I can’t reproduce your results, in my configuration I obtain:
numpy: 3.5303005139999186 (without torch.FloatTensor
initialisation)
torch: 2.5097883099999763
However, torch’s generator is still faster. If you are curious, you can compare the source implementation of poth libraries:
As far as I know, torch.rand
is using THRandom.c:
#include "THGeneral.h"
#include "THRandom.h"
#ifndef _WIN32
#include <fcntl.h>
#include <unistd.h>
#endif
/* Code for the Mersenne Twister random generator.... */
#define n _MERSENNE_STATE_N
#define m _MERSENNE_STATE_M
/* Creates (unseeded) new generator*/
static THGenerator* THGenerator_newUnseeded()
{
THGenerator *self = THAlloc(sizeof(THGenerator));
memset(self, 0, sizeof(THGenerator));
self->left = 1;
self->seeded = 0;
self->normal_is_valid = 0;
This file has been truncated. show original
And numpy.random.rand
, I think it is based on:
/*
* These function have been adapted from Python 2.4.1's _randommodule.c
*
* The following changes have been made to it in 2005 by Robert Kern:
*
* * init_by_array has been declared extern, has a void return, and uses the
* rk_state structure to hold its data.
*
* The original file has the following verbatim comments:
*
* ------------------------------------------------------------------
* The code in this module was based on a download from:
* http://www.math.keio.ac.jp/~matumoto/MT2002/emt19937ar.html
*
* It was modified in 2002 by Raymond Hettinger as follows:
*
* * the principal computational lines untouched except for tabbing.
*
* * renamed genrand_res53() to random_random() and wrapped
* in python calling/return code.
This file has been truncated. show original
1 Like
I modified your code a bit
setup1 = """
import numpy as np
import torch
def test1():
y = np.random.rand(128, 25000)
y = torch.from_numpy(y)
"""
setup2 = """
import torch
def test2():
y = torch.rand(128, 25000)
"""
if __name__ == "__main__":
import timeit
print(timeit.timeit("test1()", setup=setup1, number=100))
print(timeit.timeit("test2()", setup=setup2, number=100))
Instead of directly passing numpy array into torch tensor, using torch.from_numpy
is much faster.
These are the results I got by using the above code.
3.1434557730099186 --> numpy
2.5963897299952805 --> torch
These are the results I got when I used your original code.
19.75996291998308 --> numpy
2.466043735970743 --> torch
Thank you for your reply!
Okay, it seems like passing numpy into torch tensor is relatively slow.