Slow torch.rand() call for large tensor sizes

Hey folks,

I need to use the torch.rand() call to generate large uniformly distributed tensors. For large tensor sizes, this gets incredibly slow. My use case is for real-time inference, so ideally each call to this function should take a minimal amount of time.

To benchmark, I am trying to generate a 3D Uniform tensor

U = torch.autograd.Variable(torch.rand(500, 128, mult)).cuda(0)

Where mult is the integer in the first column in the table below. The mean time it takes to generate this tensor (in seconds) is located in the second column.

(1, 0.00098851680755615242)
(2, 0.0014483594894409179)
(4, 0.0021221637725830078)
(8, 0.0035622072219848632)
(16, 0.0069727325439453121)
(32, 0.013092503547668458)
(64, 0.02614020347595215)
(128, 0.052162947654724123)
(256, 0.1088571834564209)
(512, 0.21757780075073241)
(1024, 0.43543507575988771)

I have also tried the same benchmark using the new pyro library, to similar success. Are there any ways to speed this up?

Thank you.

You are using CPU kernel to generate random numbers. For large tensors, GPU is preferred. Try this instead: torch.cuda.FloatTensor(500, 128, mult).normal_().

Also, it might be the .cuda call taking majority of time.

Oh, that’s a massive speedup. Thank you very much!