Loading with datasets.CocoDetection - how to get binary mask?

torchvision.datasets.CocoDetection returns tensors for images a list of tensors for the segmentations in each image. I’m struggling to understand how to work with this for semantic segmentation training.

I think I want to convert this list of segmentations into binary masks, but I’m having trouble figuring out how.

Can somebody help me?

You can use from coco api to load mask related to each annotation using annToMask function.

Do you know how I can use that in conjunction with the dataloader to get a batch of masks?

It is so simple.You can just add your snippet to the getitem function in the dataloader. It will bachify for you!

I tried to use annToMask like this below. I modified my model that originally used VOC Pascal dataset. Now I am using COCO 2014.

def __getitem__(self, index):
    raw_img, anno_class_img = self.pull_item(index)
    return raw_img, anno_class_img

def pull_item(self, index):
    coco = self.coco
    img = coco.loadImgs(self.id_list[index])[0]
    image_file_path = "./{}2014-2/{}".format(self.phase, img["file_name"])
    raw_img = Image.open(image_file_path)
    raw_img = raw_img.convert('RGB')   

    cat_ids = coco.getCatIds() 
    anns_ids = coco.getAnnIds(imgIds=img['id'], catIds=cat_ids, iscrowd=None)
    anns = coco.loadAnns(anns_ids)
    mask = coco.annToMask(anns[0])
    for i in range(len(anns)):
        mask += coco.annToMask(anns[i])
    anns_img = Image.fromarray(mask)
    raw_img = self.transform(raw_img)
    anns_img = self.transform(anns_img)
    return raw_img, anns_img

Below is parts of my training function

for images, labels in dataloaders_dict[phase]:
                if images.size()[0] == 1:
                images = images.to(device)
                labels = torch.squeeze(labels)
                labels = labels.to(device)
                if (phase == 'train') and (count == 0):
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = net(images)
                    loss = criterion(outputs, labels.long())

But this is not working as the loss is way too low and when I do inference on test pictures, it classify all pixels as black, which I guess is non-object or background. I have no idea what to do. Any suggestion?
|| Loss: 0.2654 || 10iter: 45.0074 sec.
|| Loss: 0.0304 || 10iter: 34.7093 sec.
|| Loss: 0.0018 || 10iter: 53.4208 sec.
|| Loss: 0.0001 || 10iter: 34.5700 sec.

        for category in anns:
            seg_rle = category['segmentation']
            tmp = decode(frPyObjects(seg_rle, raw_img.size[1], raw_img.size[0]))
            if tmp.ndim == 3:
                tmp = np.sum(tmp, axis=2, dtype=np.uint8)
            category['segmentation'] = tmp
        for category in anns:
            pilImg = Image.fromarray(category['segmentation'])
            anns_img = pilImg.resize((raw_img.size[1], raw_img.size[0]), resample=Image.NEAREST)

I tried this variation for my dataloader but with no luck,