Image Segmentation and object detection with Bounding Box

So I have the following dataset

I’m trying to create a model to draw bounding box around the birds. I was suggested to use a Unet for the purpose of image segmentation. So would the idea be to feed the bird image to the Unet, the Unet spits out a segmentation map of my image. Then feed the segmentation map into a Deep neural network that outputs 4 numbers representing my bounding box?

The issue I’m having is now I need a separate dataset for the Unet segmentation. But how is the data process selected, for example the bird dataset, a pixel is either bird or not. So do I need to find a segmentation dataset of only 2 types of pixels in the segmentation map? And does the segmentation data need to also be birds?

I’d appreciate any insight and advice, regards

You could do that, but I would suggest you to use SAM for both segmenting and creating the bounding boxes. Also, there is a lot online tutorials like that.

But, I think if you want to just get the bounding boxes for the data you could run a recent YOLO and it would be really easy to get the bounding boxes you want. Since it isn’t a complex dataset, the yolo model would do.