Multiple convolutions performance

I have a problem about parallel multiple convolution operations.

Given B x C x H x W input features, I want to calculate C -> 1 spatial convolution ‘M’ times.

Therefore, code looks like

for i in range(M):
    output[i] = F.conv2d(input[i], weight[i], padding=1)

Each weight is 1 x C x 3 x 3 tensor.

However, these operations are very slow, compared to a C -> M spatial convolution which has same complexity.

Is there any good idea for speed-up?

Thank you.

Why not just do a M spatial convolution and then separate out the channels of the output?

Moutput = F.conv2d(input, Mweight, padding=1) # where Mweight has shape M x C x 3 x 3
for i in range(M):
    output[i] = Moutput[:, i]

Oh sorry, There is a typo.

All inputs are different.

Anyway, thank you for trying to give me the answer! Have a nice day.