Suppose that we have A x B x C x D tensors T. Let T(:,:,c,d) be a AxB matrix at the c^th and d^th slice of T, for c =1,2,…,C and d = 1,2,…,D. For example, we can consider T as tensors of weights or gradients computed in some deep networks.

I would like to ask how we can employ a GPU based batched implementation of several linear algebra operations, such as QR, SVD, matrix multiplication, matrix inverse and exponential, on T(:,:,c,d) for each c and d, using the existing pytorch classes/methods (i.e. using torch tensors, without overloading most of the pytorch methods), if possible.

I have implemented these batched operations using cuBlas and some external libraries in C/C++. So, alternatively, could you please suggest a procedure to integrate these C/C++ implementations to pytorch to process torch tensors (without using any variable conversions etc. to avoid increase of running time).

I can’t help you much, but PyTorch can be compiled with Magma which is a linear algebra library for GPU and includes QR, SVD, eigenvalues and BLAS functions (matmul, transpose)

I am not sure where/how it’s used though. — Edit: there

For anyone who needs a workaround in the meantime (I got here from googling), I have implemented and regularly use these hacks:

def batch_apply(fn, *inputs):
return torch.stack([fn(*(a[0] for a in args)) for args in zip(*(inp.split(1) for inp in inputs))])
# Specific versions for readability:
def batch_apply1(fn, inp1):
return torch.stack([fn(i[0]) for i in inp1.split(1)])
def batch_apply2(fn, inp1, inp2):
return torch.stack([fn(i1[0], i2[0]) for (i1, i2) in zip(inp1.split(1), inp2.split(1))])

And use them like this:

v = Variable(torch.randn(3, 5))
t = Variable(torch.randn(3, 5, 5))
batch_apply(lambda v, t: torch.potrs(v, t)[:,0], v, t) # Variable containing ... [torch.FloatTensor of size (3,5)]
batch_apply(lambda v: torch.diag(v), v) # Variable containing ... [torch.FloatTensor of size (3,5,5)]