I have a model where the output of two CNNs is concatenated to another vector before a final classification output. I am just wondering what impact the dimension of the CNN output has on the final model. I have seen people use size 1 and size 10 before.
I guess you are referring to the “dimension” of the feature tensors from both CNNs before concatenation?
If so, I think the same effects can be observed as in a “standard” model (i.e. without the model ensemble).
I.e. the smaller the feature tensor the more compressed the information is and vice versa. The overall performance would most likely depend on the actual use case.
That being said, note that sometimes the penultimate activations are used from pretrained models, which have a fixed dimension. If you don’t add another layer on this feature extractor, you would just go with the pretrained model architecture.