Loading with datasets.CocoDetection - how to get binary mask?

Nicholas_Wickman · February 5, 2018, 1:14am

torchvision.datasets.CocoDetection returns tensors for images a list of tensors for the segmentations in each image. I’m struggling to understand how to work with this for semantic segmentation training.

I think I want to convert this list of segmentations into binary masks, but I’m having trouble figuring out how.

Can somebody help me?

mderakhshani · February 5, 2018, 7:40am

You can use from coco api to load mask related to each annotation using annToMask function.

Nicholas_Wickman · February 5, 2018, 6:08pm

Do you know how I can use that in conjunction with the dataloader to get a batch of masks?

mderakhshani · February 5, 2018, 6:53pm

It is so simple.You can just add your snippet to the getitem function in the dataloader. It will bachify for you!

7029279 · April 20, 2020, 3:56pm

I tried to use annToMask like this below. I modified my model that originally used VOC Pascal dataset. Now I am using COCO 2014.

def __getitem__(self, index):
    raw_img, anno_class_img = self.pull_item(index)
    return raw_img, anno_class_img

def pull_item(self, index):
    coco = self.coco
    img = coco.loadImgs(self.id_list[index])[0]
    image_file_path = "./{}2014-2/{}".format(self.phase, img["file_name"])
    raw_img = Image.open(image_file_path)
    raw_img = raw_img.convert('RGB')   

    cat_ids = coco.getCatIds() 
    anns_ids = coco.getAnnIds(imgIds=img['id'], catIds=cat_ids, iscrowd=None)
    anns = coco.loadAnns(anns_ids)
    
    mask = coco.annToMask(anns[0])
    for i in range(len(anns)):
        mask += coco.annToMask(anns[i])
    anns_img = Image.fromarray(mask)
        
    raw_img = self.transform(raw_img)
    anns_img = self.transform(anns_img)
    return raw_img, anns_img

Below is parts of my training function

for images, labels in dataloaders_dict[phase]:
                if images.size()[0] == 1:
                    continue
                images = images.to(device)
                labels = torch.squeeze(labels)
                labels = labels.to(device)
                if (phase == 'train') and (count == 0):
                    optimizer.step()
                    optimizer.zero_grad()
             
                with torch.set_grad_enabled(phase == 'train'):
                    outputs = net(images)
                    loss = criterion(outputs, labels.long())

But this is not working as the loss is way too low and when I do inference on test pictures, it classify all pixels as black, which I guess is non-object or background. I have no idea what to do. Any suggestion?
|| Loss: 0.2654 || 10iter: 45.0074 sec.
|| Loss: 0.0304 || 10iter: 34.7093 sec.
|| Loss: 0.0018 || 10iter: 53.4208 sec.
|| Loss: 0.0001 || 10iter: 34.5700 sec.


        for category in anns:
            seg_rle = category['segmentation']
            tmp = decode(frPyObjects(seg_rle, raw_img.size[1], raw_img.size[0]))
            if tmp.ndim == 3:
                tmp = np.sum(tmp, axis=2, dtype=np.uint8)
            category['segmentation'] = tmp
        for category in anns:
            pilImg = Image.fromarray(category['segmentation'])
            anns_img = pilImg.resize((raw_img.size[1], raw_img.size[0]), resample=Image.NEAREST)

I tried this variation for my dataloader but with no luck,