Generalized broadcasting

Are there any in-built methods for implementing generalized broadcasting during tensor operations? I’m specifically looking for a sort of ‘block’ broadcasting that could enable, for example, the multiplication of a tensor of shape (m,n) with another tensor of shape (3m,2n) where every cell of the first tensor would be multiplied by a corresponding (3,2) block of the other tensor.

I just found a solution here: Partitioned matrix multiplication in tensorflow or pytorch - Stack Overflow but other suggestions are welcome.