I’ve been trying to train a RetinaNet on the SKU110K dataset. The first issue I encountered is that the retinanet_resnet50_fpn_v2()
function throws the following error when num_classes=2:
model = retinanet_resnet50_fpn_v2(weights='DEFAULT', score_thresh=0.35, num_classes=2).to(device)
ValueError: The parameter 'num_classes' expected value 91 but got 2 instead.
My dataset only has 2 outputs (object and background), so why does the retinanet_resnet50_fpn_v2
function require num_classes=91
?
The other issue is that each input image has a differing number of bounding boxes associated with it. My Dataset loads items in the following way:
def __getitem__(self, idx):
if torch.is_tensor(idx):
idx = idx.tolist()
img_name = self.image_names[idx]
class_name, width, height = self.ds[self.ds.image_name == img_name].iloc[0, 5:8]
boxes = np.array(self.ds[self.ds.image_name == img_name].iloc[:, 1:5])
image = Image.open(f"{img_name}")
image = image.resize((640, 640))
# Rescale everything to 640
Fx = image.width // 640
Fy = image.height // 640
boxes[:, 0] = boxes[:, 0] / Fx
boxes[:, 2] = boxes[:, 2] / Fx
boxes[:, 1] = boxes[:, 1] / Fy
boxes[:, 3] = boxes[:, 3] / Fy
image = self.transform(image) # [T.ToImageTensor(), T.ConvertImageDtype()]
sample = {"img": image, "boxes": boxes, "label": 1}
return sample
In this scenario, sample['img']
is a Tensor of shape (3, 640, 640)
and sample['boxes']
is an array of shape (N, 4), where N is the number of bounding boxes in the image.
The other issue I have is that when I get a batch from the dataloader, I get
stack expects each tensor to be equal size, but got [74, 4] at entry 0 and [128, 4] at entry 1
Are there any Retinanet examples out there that use torchvisions v2 model?