Merging 2 Neural Networks

I have 2 trained models one is for Image Segmentation and one for Image localization.
The problem here is that i want to Segment a few classes and localize the others ( on the same Input Image).
I have tried to run both of those Networks on the same image and it does the job, but it is extremely slow.
So i was wondering if there is a way to merge the 2 networks, so that it would then give me 2 outputs the segmentation mask and the localization coordinates.

I was planning to use the Encoder Decoder network and freezing the weights of the encoder while training the decoder to give to 2 outputs at 3 scales.

But i am not sure if it will work, i need suggestions and help…

What kind of CNNs are you using at the moment and how do these architectures relate to your encoder-decoder model?

Do your current models share some layers (e.g. is the base of the model “similar”)?
If so, you could try to retrain a new model by keeping the common architecture and use two different heads for each task.

I am using a U-Net for image segmentaion and Yolo-V3 for localization, they don’t share any layer yet.

I was planning to edit the U-Net by adding just the yolo layer at the end along with the segmentation layers and then train the decoder in the U-Net, but not sure if it’ll work