Hi,
No this is not correct. Each kernel has same amount of channels as the input.
In our earlier discussion, Why add an extra dimension to convolution layer weights? - #2 by Nikronic, we agreed that if input has [3, h, w]
, and if we want to have 10 output channels, then we run 10 different filters which each has size of [3, k, k] which creates [1, h, w] responses then we stack all of them which in our case we wanted 10 output channels so we have 10 [1, h, w] responses then by stacking them along channel dim we get [10, h ,w]. About the values, assume a 2D gaussian for [1, k, k] filter, now assume a 3D gaussian for [3, k, k] filter.
So,
- All values distributed from a function which leads to different values in each channel (can be identical but no reason for that)
- Based on my example, no, we have [4, 3, f, f] filters but output is [4, h ,w] and there is no averaging as responses have only one and channel we stack them.
- based on 2
- About this I am not really sure about the idea, do you mean skipping some windows in convolution? or skipping some filters?
Bests