I have two lists of matrices. The first one has shape [(N, n1), (N, n2), (N, n3)]. The second of has shape [(n1, B), (n2, B), (n3, B)]. I would like to multiply the two lists to obtain [(N, B), (N,B), (N, B)]. Is it possbile to do this in Pytorch without looping through the lists?

You can pad these matrices with zeros and concatenate them to form one big tensor. An then, you’ll be able to perform a normal matrix multiply, and slice the result to get the corresponding sub-matrices results. However, I’m not sure it is the best way to do this, it will depend on the values of n1, n2, etc. Maybe looping is better.

If you really need performance, and if you have big matrices, you’ll actually need to implement your own CUDA kernel, and this I think is the best you can do.

Thanks for your reply! In my cases, n1, n2…may differ by a lot. So I think zero padding may not be efficient. Since I also want the autogradient in Pytorch, to implement CUDA kernels may not be easy. I guess for loops may be the only choice.