Hi, I have a nasty for loop that I am trying to get around in my implementation:

Given:

`A`

- a list of N vectors

`B`

- a list of N matrices

The outer dimension of each vector-matrix pair is same (and is equal to `1`

& `M`

). But the inner dimension is different. I want to multiplying each pair and stack the resultant vectors so that I get a matrix C of size [N,M]:

```
C = torch.zeros(0,M)
for i in range(N):
Ci = torch.mm(A[i],B[i])
C = torch.cat((C,Ci),0)
```

I was wondering if it was possible to get rid of the for loop and somehow parallelise the computation for `C`

.

One way to address the issue would be to get a matrix using the list of vectors `A`

by padding zeros:

```
P = sum of the size of inner dimension over all pairs
ABatch = torch.zeros(0,P)
pos = 0
for i in range(N):
AiTemp = torch.cat((torch.zeros(1,pos),A[i]),1)
pos += A[i].size()[1]
Ai = torch.cat((AiTemp,torch.zeros(1,P-pos)),1)
ABatch = torch.cat((ABatch, Ai),0)
```

and concatenate the each matrix in the list `B`

along the first dimension to get `BBatch`

of size `[P,M]`

This approach helps because I can avoid the for loop in forward prop once I have obtained `ABatch`

`[N,P]`

and `BBatch`

`[P,M]`

:

```
C = torch.mm(ABatch, BBatch)
```

However it would be great if there is an alternative that I can use to avoid having unnecessarily large tensor `ABatch`

.