Hi,
Conv2d needs 2D kernels with 1 channel (grayscale mode, 3 in RGB). For having outputs with more than one, you need to run conv2d out_channel
times using [1, k, k]
size kernels so the result will be like [out_channel, h, w]
because all the respones to out_channel
different [1, k, k]
kernels have been concatenated.
For instance, assume a case that your input image has 10 dimensions [batch_size, 10, h, w]
and you want to have 3 as output channel, [batch_size, 3, h, w]
. In this case, we need 3 different filters that each has size of [10, k, k]
. Each one will create a output with size of [batch_size, 1, h, w]
and finally all will be concatenated to have a output [batch, 3, h, w]
. So, the kernel size in this case would be [3, 10, k, k]
.
This thread may also help:
Bests