I have 2 trained models one is for Image Segmentation and one for Image localization.
The problem here is that i want to Segment a few classes and localize the others ( on the same Input Image).
I have tried to run both of those Networks on the same image and it does the job, but it is extremely slow.
So i was wondering if there is a way to merge the 2 networks, so that it would then give me 2 outputs the segmentation mask and the localization coordinates.
I was planning to use the Encoder Decoder network and freezing the weights of the encoder while training the decoder to give to 2 outputs at 3 scales.
But i am not sure if it will work, i need suggestions and help…
What kind of CNNs are you using at the moment and how do these architectures relate to your encoder-decoder model?
Do your current models share some layers (e.g. is the base of the model “similar”)?
If so, you could try to retrain a new model by keeping the common architecture and use two different heads for each task.
I am using a U-Net for image segmentaion and Yolo-V3 for localization, they don’t share any layer yet.
I was planning to edit the U-Net by adding just the yolo layer at the end along with the segmentation layers and then train the decoder in the U-Net, but not sure if it’ll work