What backend for linear algebra does the precompiled Pytorch use?

When using the precompiled Pytorch (on windows) from the official homepage, what backend is then used for linear algebra on GPU? Is it Magma or something else? As I have understood/misunderstood it one can choose this when compiling.

Background: I want to do batched lu and cholesky factorization and it is a bit slow when applyed to many small matrices (200 x 200). For instance, the same operation is twice as fast in Tensorflow. And when looking at some research it seems like implementation can matter a lot. Up to a factor 10.

Hi,

IIRC the binaries are shipped with mkl for cpu and magma for GPU.

Thanks! I had actually hoped not. Magma is as far as I understand really good, and still in LU factorisation I only get about 30GFlop/s on a card that can do 10TFlop/s.

Hi,

Note that many small matrices is usually the worst case scenario in terms of performance for a GPU. So I wouldn’t be very surprised not to be anywhere near to theoretical maximum card throughput.
Is it better with larger matrices?