Hello!
Say, we have an intermediate layer of a neural network, which gets two inputs:
- Output generated by a previous layer in the following format: N=2000 elements, Cin=10 channels, H=W=100 pixels
- Another input in the following format: N=2000 elements, Cin=1 channel, H=W=100 pixels
And we need to combine these two inputs into one and then apply a convolution to the combined data, which will have the following format: N=2000 elements, Cin = 10+1 = 11 channels, H=W=100 pixels
What is the most efficient way to do this?
I’ve considered several options, but all of them look bad.
-
In the given layer construct a new tensor which will have for each input element 10 channels from the first input stream and 1 channel from the second one. This approach will require resizing of input tensor on the fly, which is not very efficient.
-
Use 11 input channels in all previous operations, but somehow restrict the operations to use only 10 of them to avoid adding unused weights. Unfortunatelly, I didn’t found how to implement this approach.
-
Use 11 input channels in all previous operations, and do not restict them from using 11th channel. This will generate redundand connections (weights) between neurons.
Do you have any ideas?