Hi,
Is it possible to use GPU accelerated factorizations? I can’t find any information in the documentation and it seems that in version “0.3.0.post4” pytorch copies the tensor onto RAM and runs a CPU implementation of the factorizations.
Hi,
Is it possible to use GPU accelerated factorizations? I can’t find any information in the documentation and it seems that in version “0.3.0.post4” pytorch copies the tensor onto RAM and runs a CPU implementation of the factorizations.
torch.qr
and torch.svd
both run on the GPU if your input is a CUDA tensor. They might also run parts of the algorithm on the CPU.
I also thought that if the input tensor is a cuda array then it should use the GPU but running
T = torch.randn(400,400).cuda()
for i in range(100):
somevar = torch.qr(T)
causes only a marginal increase in volatile GPU utilization (8-10%)
but blows up all CPU cores.
Hi, have you solved your problem? I have the same problem as you.
I simply move data to CPU to run QR decomposition.
@YiifeiWang this may be faster depending on your setup. I found that doing qr on a 28 core/ 56 virtual cpu machine is 2x faster than running it on a nvidia titan.