Multiple Single Object Localization

This is my first time creating a model tasked with predicting multiple locations of an object (single class) in an image. My dataset consists of X and Y values. X values are images, and the Y values are lists of bounding boxes (list of lists of bounding boxes for each image).

I want the model to predict all bounding boxes in a single passed image.

It seems that this tutorial is the closest to my use case.

But I have two questions:

  1. Since I only have one class, if I delete/modify lines of code in the tutorial that are responsible for the classification task, would that mean that the model (both regression and classification) included in the tutorial would not be suitable for my use case?

  2. I dont think the pre-trained model on COCO would be suitable, since my images are scanned text pages, is that right? Should I instead train the model (say, resenet50) on my dataset?

Thank you all for your help and time.