Correct labeling for two class Resnet50 FasterRCNN setup

I am using Resnet50 as the backbone for a faster-rcnn setup using ultrasound images. I have done initial training (after downloading the default weights) with the BUSI ultrasound set and now have another set to further refine the model. My data is composed of two classes of ultrasound images (benign and malignant), where each is composed of various frames from US video. Only some of the frames have annotated bounding boxes.
I’m using two classes, one for normal (any image that does not have an annotation) and class 2 for anything with annotations (even benign with annotations). To get the label 1 classes through, I’ve put them as

Loading the default model and snippet for setting the background dataloader values:

#model load

modelr=torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=False, 
                                                            num_classes=2,
                                                            pretrained_backbone=True)
#############
#Assigning data
            #for a Normal (label 0)
            boxes = torch.zeros((0, 4), dtype=torch.float32)
            area = nr*nc #number of image rows x columns #using full image size for area
            boxes = torch.as_tensor(boxes, dtype=torch.double)
            area = torch.as_tensor(area, dtype=torch.double)

            label = torch.zeros((1,), dtype=torch.int64)

            iscrowd=torch.ones((1,), dtype=torch.int64)

Since I have two classes and each test image is from a series of images that may or may not have a small selection with ground truth, is it correct to assign normal to “background” and not normal to label 1 (does num classes of 2 signify background and another label?)? Inputting images without annotations in the fasterRCNN format required some trickery (area size, iscrowd), which I’m not sure is entirely correct.
I get decent IOU with one dataset, but I’ve got a suspicion I have a core setup issue somewhere. Occasionally, I will get a test output that returns an empty value for predicted bounding box, label, and score. Is this normal for a dual label FasterRCNN setup? Having a series of images all taken from one video with one overall label and only some of those frames having a truth annotation is a bit confusing.