Trying to get rid of for loop over vector-matrix pairs

Hi, I have a nasty for loop that I am trying to get around in my implementation:

Given:
A - a list of N vectors
B - a list of N matrices
The outer dimension of each vector-matrix pair is same (and is equal to 1 & M). But the inner dimension is different. I want to multiplying each pair and stack the resultant vectors so that I get a matrix C of size [N,M]:

C = torch.zeros(0,M)
for i in range(N):
	Ci = torch.mm(A[i],B[i])
	C = torch.cat((C,Ci),0)

I was wondering if it was possible to get rid of the for loop and somehow parallelise the computation for C.

One way to address the issue would be to get a matrix using the list of vectors A by padding zeros:

P = sum of the size of inner dimension over all pairs
ABatch = torch.zeros(0,P)
pos = 0
for i in range(N):
	AiTemp = torch.cat((torch.zeros(1,pos),A[i]),1)
	pos += A[i].size()[1]
	Ai = torch.cat((AiTemp,torch.zeros(1,P-pos)),1)
	ABatch = torch.cat((ABatch, Ai),0)

and concatenate the each matrix in the list B along the first dimension to get BBatch of size [P,M]

This approach helps because I can avoid the for loop in forward prop once I have obtained ABatch [N,P] and BBatch [P,M]:

C = torch.mm(ABatch, BBatch)

However it would be great if there is an alternative that I can use to avoid having unnecessarily large tensor ABatch.