GPU implementation of QR\SVD


Is it possible to use GPU accelerated factorizations? I can’t find any information in the documentation and it seems that in version “0.3.0.post4” pytorch copies the tensor onto RAM and runs a CPU implementation of the factorizations.

1 Like

torch.qr and torch.svd both run on the GPU if your input is a CUDA tensor. They might also run parts of the algorithm on the CPU.


I also thought that if the input tensor is a cuda array then it should use the GPU but running

T = torch.randn(400,400).cuda()
 for i in range(100):
    somevar = torch.qr(T)

causes only a marginal increase in volatile GPU utilization (8-10%)
but blows up all CPU cores.

Hi, have you solved your problem? I have the same problem as you.

I simply move data to CPU to run QR decomposition.

@YiifeiWang this may be faster depending on your setup. I found that doing qr on a 28 core/ 56 virtual cpu machine is 2x faster than running it on a nvidia titan.