Channel Convolution

In Depth-wise convolution layer of MobileNet-v1, each channel of the input is convolved with a kernel with channel length = 1, then we use groups = in_channels in Conv2d to produce 1-channel kernels (in_channels = out_channels in depth-wise convolution). In the paper MoBiNet ([1907.12629] MoBiNet: A Mobile Binary Network for Image Classification), the author define the term K-dependency
image
image
From what I understand, this means that each output is now the activation of the sum of the convolution of each corresponding 1-channel kernel with channels in the same group of the inputs.
For example, we have Input with 4 channels C1, C2, C3, C4, then we seperate them into 2 groups, group 1 has C1, C2 and group 2 contains C3, C4. After that we have 4 1-channel kernels K1, K2, K3, K4. Then the output will be
O1 = Activation(C1K1 + C2K1),
O2 = Activation(C1K2 + C2K2),
O3 = Activation(C3K3 + C4K3),
O4 = Activation(C3K4 + C4K4)
(* is the convolution operation)
I have tried several ways to implement this on pytorch but they turned out to be failures. Any suggestion on how to implement this?
Thank you in advance!