I have a problem about parallel multiple convolution operations.
Given B x C x H x W input features, I want to calculate C -> 1 spatial convolution ‘M’ times.
Therefore, code looks like
for i in range(M): output[i] = F.conv2d(input[i], weight[i], padding=1)
Each weight is 1 x C x 3 x 3 tensor.
However, these operations are very slow, compared to a C -> M spatial convolution which has same complexity.
Is there any good idea for speed-up?