How to parse an annotation file that has multiple objects in an image

I have CSV annotated data and corresponding images, each image contains more than one object. The annotation has different columns such as image names column, bounding box coordinates in xmin, ymin, xmax, ymax format(top left and bottom right respectively), and class names such as a person, car, cat, etc. I tried to use a customer dataset preprocessing from different sources and data loader, but I have got an error saying that the length of input in the image and the annotation are not the same. This is due to the image has more than one object so that the name is mentioned more than once in the image name column for each bounding box coordinates. I would appreciate it if anyone can give me a reference to read to solve the problem or gave a skeleton sample code. My final goal is to build an object detection model to classify objects based on the category, aeroplane, car, cat etc. All images are in the same folder, not based on the category.