Incorporating metadata into a resnet50 object detector

Has anyone ever tried building something similar to some kind of Late Fusion with Resnet50 Object Detectors (Fast RCNN, Mask RCNN)?

Essentially have two models (one being resnet and the other being a smaller network that receives the metadata of the image which contains very important insights such as image size, empirical size of the image in inches, categorical info about the image).

All of this to hopefully have the model gain better insights somewhere down the line before making classifications.

Looking for advice and or resources !!

Grateful for any direction!