Slow torch.rand() call for large tensor sizes

ktf · November 28, 2017, 10:10pm

Hey folks,

I need to use the torch.rand() call to generate large uniformly distributed tensors. For large tensor sizes, this gets incredibly slow. My use case is for real-time inference, so ideally each call to this function should take a minimal amount of time.

To benchmark, I am trying to generate a 3D Uniform tensor

U = torch.autograd.Variable(torch.rand(500, 128, mult)).cuda(0)

Where mult is the integer in the first column in the table below. The mean time it takes to generate this tensor (in seconds) is located in the second column.

(1, 0.00098851680755615242)
(2, 0.0014483594894409179)
(4, 0.0021221637725830078)
(8, 0.0035622072219848632)
(16, 0.0069727325439453121)
(32, 0.013092503547668458)
(64, 0.02614020347595215)
(128, 0.052162947654724123)
(256, 0.1088571834564209)
(512, 0.21757780075073241)
(1024, 0.43543507575988771)

I have also tried the same benchmark using the new pyro library, to similar success. Are there any ways to speed this up?

Thank you.

SimonW · November 29, 2017, 12:03am

You are using CPU kernel to generate random numbers. For large tensors, GPU is preferred. Try this instead: torch.cuda.FloatTensor(500, 128, mult).normal_().

Also, it might be the .cuda call taking majority of time.

ktf · November 29, 2017, 12:24am

Oh, that’s a massive speedup. Thank you very much!