Does redundancy in ground truth labels hurt model performance

Let’s say I am training an object detection algorithm where the goal is to identify the class as well as the bounding box .

Now if in my dataset if a majority of the ground truth bounding boxes have the same values,then will this in any way impact the model’s learning capability.

For example while training a basic object detection model on the above mentioned dataset,I noticed that when the model was given a new unseen images,it was unable to correctly predict the bounding box.

Preview of dataset is shown below

Class 1
image 1 bbox coords 192 63 242 243
image 2 bbox coords 192 63 242 243
.
.
.
.
.image 19

in each image,object is either zoomed in or being viewed from different angle.
(Also please note that the bounding box coordinates are not an exact representation of my dataset and were included to give an idea of what my dataset looks like).
I tried computing the R2 score between the predicted bounding box and actual bounding box,and found that for unseen examples the R2 score came out to be highly negative.

My guess would be that the model might learn to predict the “majority” bounding box coordinates in the same way it would learn to predict the majority class index in an imbalanced classification use case.

So in a sense ,can we assume that there might be some overfitting,as such would you recommend making changes in the dataset so that the bounding box distribution is properly captured.
I tried to use weighted sampling during training process but that didn’t give me any significant improvements.