Hi
I have an input tensor of n*p. p is equal to k times q, which means in the p columns, every k columns are a group of features.

Meanwhile, I have a weight tensor of k*1. So I use a for loop to do multiplication between every k column of the input and the weight. It is slow.

So I am thinking it is possible to use vectorization to speed up the for loop. I got stuck on how to convert the weight tensor to a p*q tensor. The new weight tensor has a specific pattern (see the following image).

One alternative way is to reshape the input and keep the shape of the weight unchanged. But I need to reshape back the product. So I am still thinking about how to reorganize the weight. Maybe I should use a binary mask.