Image Captioning using YOLO

Hello ,
I have been trying to extract the bounding box coordinates of my Image using a pre-trained YOLO model as you can find in the below code snippet. Can you please help how could I extract the bounding box coordinates from the YOLO code snippet and what changes I have to made in my code of DecoderRNN to concatenate my caption embedding with the yolo coordinates and then Fed it to the LSTM.


This is the YOLO code snippet