Combining different types of features for classification

I am using first few layers of the pretrained alexnet to extract features from a given image and apply RoI pool layer to get a fixed dimensional feature vector corresponsing to one of the Bounding Boxes (BBox) in the image that I pass to a single layered classifier.
My question is that, now I also want to use some extra features to improve the classification performance. I have about 3000 dimensional visual features and 4 other features (x, y, width, height) of the BBox. Should I just concatenate the 3000 dim visual features with the 4 extra features?

  1. Is it possible that the 3000 dim will overpower the classifier and not let the classifier learn something significant from the 4 extra features since 3000 is much greater than 4?
  2. Should I do some kind of normalization before concatenating the 3000 dim features with the 4 dim features?
  3. Is it a good idea to add a batch norm layer on the 3000 dim features and another batch norm on the 4 dim features and then concatenate these?
  4. Should I project the 3000 dims to a smaller dimension or the 4 dims to a higher dimension and then concatenate them?
  5. Is there a better way than concatenation to use the two types of features for classification?



  1. Yes its better to use batch norm for 3000 dim features. Your 4 dim are bounding box location? Try normalizing both. Otherwise that will leads to abrupt result.

Try it(Not sure if It will work in your case)…
1.Use Upsampling(deconv) layers to increase the dimension of 4 dim.
2. Reduce the dimension of 3000 by using conv. I think if you try flattening kind then you will miss a lot of information.

Please explain a little more about the 3000 dim features.Because concatenation in your case sounds little different to me.