Combining input streams in a proper way

Say, we have an intermediate layer of a neural network, which gets two inputs:

  1. Output generated by a previous layer in the following format: N=2000 elements, Cin=10 channels, H=W=100 pixels
  2. Another input in the following format: N=2000 elements, Cin=1 channel, H=W=100 pixels

And we need to combine these two inputs into one and then apply a convolution to the combined data, which will have the following format: N=2000 elements, Cin = 10+1 = 11 channels, H=W=100 pixels

What is the most efficient way to do this?

I’ve considered several options, but all of them look bad.

  1. In the given layer construct a new tensor which will have for each input element 10 channels from the first input stream and 1 channel from the second one. This approach will require resizing of input tensor on the fly, which is not very efficient.

  2. Use 11 input channels in all previous operations, but somehow restrict the operations to use only 10 of them to avoid adding unused weights. Unfortunatelly, I didn’t found how to implement this approach.

  3. Use 11 input channels in all previous operations, and do not restict them from using 11th channel. This will generate redundand connections (weights) between neurons.

Do you have any ideas?

If you don’t want to use (“resizing”), you can use distributivity property:
conv(cat(A,B), cat(W1,W2), bias) = conv(A,W1)+conv(B,W2, bias)
and do two “conv” ops (10 to 10 and 1 to 10)

Hi! Thank you for your answer. Eventually, I’ve used
I use group=1 in conv operation (both outputs depend on both inputs), therefore the property has to be rewritten in the following way:
conv(cat(A,B), cat(W1,W2)) = cat(conv(A,cat(W1,W2)), conv(B,cat(W1,W2))
And the implementation of this technique seems to be even less effective than a single

groups don’t matter, the principle is the same as with matrix multiplication:

with concatenated inputs you have
batch x 11 @ 11 x 10 = batch x 10

equivalent with split matrices:
batch x 10 @ 10 x 10 + batch x 1 @ 1 x 10

Now, two separate matmul/conv ops with smaller tensors are likely to be slower than one with concatenated inputs. The only upside of avoiding cat is memory savings (also note that inplace summation is possible).