I have a matrix A with dimension [batch_size,N,M,D] and another matrix B with dimension [batch_size,P,D]. I want to get a tensor C as output with dimension **[batch_size,N,M,P]** in the following way:

C[i,j] = matrix_dot_product(A[i,j], B[i]) 0<=i<batch_size, 0<=j<N

What is the most memory-efficient way to do this?

Thanks!