Most efficient way to perform multiplication of list of tensors?

hugo_thim · January 17, 2022, 11:04am

Hello everyone,

I am wondering what is the most efficient way to perform multiplication of tensors contained in a sublist within two lists.

Let us say that I have a list A containing sublists a1,...ak containing tensors of different sizes within each list, e.g. a1 = [a11, a12, a13, a14] where

a11.size() == torch.Size([128, 274])
a12.size() == torch.Size([1, 128])
a13.size() == torch.Size([256, 128])
a14.size() == torch.Size([1, 256])

but note that each list within A contains tensors that have the same sizes as those in a1.

Now consider a quite similar list, B = [b1,..., bn], where each sublist bi contains the same number of tensors as those in the sublist of A. Moreover, the inner tensors have the same shapes as those in each sublist aj.

I am looking to “multiply” each sublist in A with each sublist in B such that I get a list C which contains k lists, each containing the matrix multplication of the sublists aj with each sublist in B. Formally,

C = [ [a1 * b1, a1 * b2, ..., a1 * bn], [a2 * b1, a2 * b2, ..., a2 * bn], ..., ]

where a1 * b1 denotes [a11 * b11, a12 * b12, a13 * b13, a14 * b14].

However, I cannot perform a multiplication between list of tensors efficiently, for instance trying a1 * b1 outputs the following error

TypeError: can't multiply sequence by non-int of type 'list'

Thus my guess is that the format in which I am storing my tensors is probably not optimal.

Is the best way to perform such computation the following

c =  [ [torch.mul(aik, bjk)) for aik, bjk in zip(ai, bj)] for ai, bj in zip(A, B)]

or is there a more efficient way to do so ? (e.g. storing tensors in other than a list).

googlebot · January 17, 2022, 2:49pm

it is best to have multiplicands in a single memory block (buffer), but this is often impossible to arrange, as copying is not efficient, and “out” argument breaks gradients. if you mean matrix multiplication, you can’t “pack” tensors with varying shapes either.

apart from that, you can only reduce invocation overhead a bit (e.g. with JIT), but this will likely be insignificant, unless you’re dealing with hundreds of small tensors.

hugo_thim · January 18, 2022, 10:12am

I believe I have found the answer to my question. Since, tensors can only be concatenated when of identical size, I can simply concatenate each tensors of identical sizes in each of the lists in A and B to make 3D tensors.

For instance,

A = [torch.stack(x) for x in list(zip(*A))]
B = [torch.stack(x) for x in list(zip(*B))]

outputs two lists containing 4 3D tensors each having respectively k and n as first dimension. To stay coherent with the post, the first element in A has now the following size

A[0].size() == torch.Size([k, 128, 274])

Then flattening each of the tensors in dimension 1 and 2 + transposition allows standard faster matrix multiplication.

C = [(torch.mm(torch.flatten(a, start_dim=1),
               torch.flatten(b, start_dim=1).transpose(0,1)), dim=1) 
      for a, b in zip(A, B)]

This allowed my code to run 6x faster.