Any way to compute torch.eig in batch or in a async way on a GPU?

I wonder if there’s a batch version of torch.eig or calling it in a async way on a GPU?
For example,

aaa = torch.randn((5, 5))
mat = torch.mm(torch.t(aaa), aaa)
mat = mat.pin_memory().cuda(async=True)
for ii in range(1000):
    torch.eig(mat, eigenvectors=True)

How can I run these 1000 torch.eig computations simultaneously on GPU?

the CUDA kernel queue is of size 1023 i think. So maybe you can do 10 or 100 of these asynchronously, but possibly not 1000 (as each of .eig might be calling multiple kernels).

cool, thanks! that’s good to know. am I calling it in the right way? or there need to be a new function written?