Is there any way to perform a vanilla convolution operation but without the function summation? Assume that we a feature map, X, of size [B, 3, 64, 64] and a single kernel of size [1, 3, 3, 3]. When doing the vanilla convolution, we get a feature map of size [B, 1, 62, 62], while I’m after a way to get a feature map of size [B, 3, 62, 62], just before collapsing/summing all the convolutional channels into a single feature map
How would you like to perform the reduction of each step?
Generally, you could unfold the input into 3x3x3 patches, perform the multiplication with the kernel, (sum the result), and fold/reshape to the output shape. Since you are not performing the sum, you would have overlapping patches and I’m not sure how you would like to reduce/reshape them back.
I want to avoid the reduction across the channels and not the spatial multiplication.
So, each kernel of size 3x3x3 gives three feature maps, instead of merging them to form a single feature map in the output.
Hi @Saeed_Izadi1, did you find a solution to this problem?
@ptrblck What Saeed was after is the following:
A normal convolution between a [Bx3x5x5] input and a [1x3x3x3] kernel would produce a [Bx1x4x4] response. The reason for that is that after performing spatial convolution channel-wise, i.e. along the last 2 indices ( and producing a tensor [Bx1x3x4x4] ), the channel-wise responses are summed up into a single channel and thus convolution produces a tensor [Bx1x4x4]. Is there a way to obtain access to the tensor [Bx1x3x4x4] ?
Wouldn’t my code snippet yield exactly this?
Have a look at this comparison with a manual approach, where each kernel is used on a single input channel: