I need to use the torch.rand() call to generate large uniformly distributed tensors. For large tensor sizes, this gets incredibly slow. My use case is for real-time inference, so ideally each call to this function should take a minimal amount of time.
To benchmark, I am trying to generate a 3D Uniform tensor
U = torch.autograd.Variable(torch.rand(500, 128, mult)).cuda(0)
Where mult is the integer in the first column in the table below. The mean time it takes to generate this tensor (in seconds) is located in the second column.
I have also tried the same benchmark using the new pyro library, to similar success. Are there any ways to speed this up?