Merge CNN model and Bag of Words

I hava a dataset consist of images and corresponding titles.
This is a classification problem and there are 30 classes.
I want to create CNN model with images and Bag of Words model with titles.
I created vocubulary with text data and got output vectors for each titles from Bag of Words.
I just want to merge CNN model and Bag of Words output to get one output.

Actually, I wanted to concatenate vectors from Bag of Words to first CNN fully connected layer.
But vectors includes just frequencies numbers and merging this in the fc layer doesn’t make sense.

Is there any advice ?