2d Convolution in expansive path in Unet from 256 to 128 feature maps

Hi,
how does 2d Convolution in expansive path in Unet from 256 to 128 feature maps work?
I mean normally that happened if we want to extract more feature maps from an input.

The number of output channels corresponds to the number of kernels in the default setup (no grouped convs etc.).
In this particular example you would use 128 kernels each with an in_channels attribute of 256, which means that each kernel processes all input channels and creates one activation map (channel) as its output.

Thanks for replying. Is that meaning, when each kernel processes all input (256), the average of all 256 images is built as output map? Or there is an other way to win one activation map from 256?

Each kernel has a dimension [out_channels, in_channels, height, width] and thus does not take the average of the input activation but processes each input channel with a different set of weights (note the in_channels dimension).
CS231n - Conv explains it in more details with a few vistualizations.

thanks a lot for helping. Now it’s clearly how does it work.