Help with the grouped convolutions using the local binary neural networks

I am trying implement the effecientetv2_s model by substituting the 3x3 convolution blocks with local binary convolution blocks. However, I am getting confused on how to implement the depth wise separable scheme with them.

A local binary convolutional block (LBCB): The weights of the 3x3 convolutions are replaced with random designation of ‘0’,‘1’ and ‘-1’. The maps obtained from there are called, difference maps, which are then passed through Relu to generate the binary maps, and finally, use learnable 1x1 convolutions to generate feature maps.

Should I just perform the group convolution while computing the difference maps and let the 1x1 be the pointwise computation? or perform the whole LBCB in place of depth-wise convolution and again use 1x1 point wise convolution?

For reference, I am attaching the link of the local binary neural network: