Hey folks,

I need to use the torch.rand() call to generate large uniformly distributed tensors. For large tensor sizes, this gets incredibly slow. My use case is for real-time inference, so ideally each call to this function should take a minimal amount of time.

To benchmark, I am trying to generate a 3D Uniform tensor

U = torch.autograd.Variable(torch.rand(500, 128, mult)).cuda(0)

Where mult is the integer in the first column in the table below. The mean time it takes to generate this tensor (in seconds) is located in the second column.

(1, 0.00098851680755615242)

(2, 0.0014483594894409179)

(4, 0.0021221637725830078)

(8, 0.0035622072219848632)

(16, 0.0069727325439453121)

(32, 0.013092503547668458)

(64, 0.02614020347595215)

(128, 0.052162947654724123)

(256, 0.1088571834564209)

(512, 0.21757780075073241)

(1024, 0.43543507575988771)

I have also tried the same benchmark using the new pyro library, to similar success. Are there any ways to speed this up?

Thank you.