Hi,
Yes exactly, that is why in kernel size you just provide (h, w)
not channel size of it, because it has to match in_channels
of conv2d layer. And again you are right, output_channels
is the number of filters with size of (in_channels, h, w)
which will be stacked together at the end.
Bests