Why add an extra dimension to convolution layer weights?

Nikronic · June 25, 2020, 10:04pm

Hi,

Conv2d needs 2D kernels with 1 channel (grayscale mode, 3 in RGB). For having outputs with more than one, you need to run conv2d out_channel times using [1, k, k] size kernels so the result will be like [out_channel, h, w] because all the respones to out_channel different [1, k, k] kernels have been concatenated.

For instance, assume a case that your input image has 10 dimensions [batch_size, 10, h, w] and you want to have 3 as output channel, [batch_size, 3, h, w]. In this case, we need 3 different filters that each has size of [10, k, k]. Each one will create a output with size of [batch_size, 1, h, w] and finally all will be concatenated to have a output [batch, 3, h, w]. So, the kernel size in this case would be [3, 10, k, k].

This thread may also help:

Bests