GPU implementation of QR\SVD

PiotrSokol · January 18, 2018, 10:34pm

Hi,

Is it possible to use GPU accelerated factorizations? I can’t find any information in the documentation and it seems that in version “0.3.0.post4” pytorch copies the tensor onto RAM and runs a CPU implementation of the factorizations.

richard · January 18, 2018, 10:40pm

torch.qr and torch.svd both run on the GPU if your input is a CUDA tensor. They might also run parts of the algorithm on the CPU.

PiotrSokol · January 19, 2018, 12:34am

@richard,

I also thought that if the input tensor is a cuda array then it should use the GPU but running

T = torch.randn(400,400).cuda()
 for i in range(100):
    somevar = torch.qr(T)

causes only a marginal increase in volatile GPU utilization (8-10%)
but blows up all CPU cores.

linyu · September 14, 2018, 8:12am

Hi, have you solved your problem? I have the same problem as you.

YiifeiWang · January 30, 2019, 1:48am

I simply move data to CPU to run QR decomposition.

PiotrSokol · February 19, 2019, 4:40pm

@YiifeiWang this may be faster depending on your setup. I found that doing qr on a 28 core/ 56 virtual cpu machine is 2x faster than running it on a nvidia titan.