COCO dataset from custom semantic segmentation dataset for detectron2

dip4fish · March 6, 2020, 10:08am

Hello,
I have several datasets, made of pairs of images (greyscaled, groundtruth) looking like this:

yZvLx
where the groundtruth labels can decomposed into three binary masks.

These datasets (for example) are available as a numpy array of shape (N, width, height, comp), or as pairs of png images also available on github.

The project would be to train different semantic/ instance segmentation models available in Detectron2 on these datasets. I understand that detectron 2 needs a COCO formatted dataset to work on.

I would like to build a minimalist coco dataset from a pair of grey+groundtruth (or masks) images.
Is there a tool available in PyTorch for that purpose?

I know there are two libraries (pycococreator, imantics) for this. I haven’t been successful up to now with pycococreator (A draft colab notebook is available).

For example, the following snippet fails:

from pycococreatortools import pycococreatortools

N = 1
grey = data[N,:,:,0]
labels = data[N,:,:,1]
mask1 = labels == 1
mask2 = labels == 2
mask3 = labels == 3

segmentation_id = "chromosome1_1" 
image_id = "0001"
category_info = "chromosome1"
binary_mask = mask1
image = grey
print(image.size, image.shape)

annotation_info_mask1 = pycococreatortools.create_annotation_info(segmentation_id, image_id, 
                                                              category_info,
                                                              binary_mask,
                                                              image.size, tolerance=2)

ptrblck · March 7, 2020, 4:22am

What kind of error are you getting using the provided code snippet?

dip4fish · March 7, 2020, 11:07am

Hello,

Here’s what I get in the colab notebook:

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

<ipython-input-27-ac798acc8333> in <module>()
      2                                                                   category_info,
      3                                                                   binary_mask,
----> 4                                                                   image.size, tolerance=2)

2 frames

/usr/local/lib/python3.6/dist-packages/PIL/Image.py in resize(self, size, resample, box)
   1866             )
   1867 
-> 1868         size = tuple(size)
   1869 
   1870         if box is None:

TypeError: 'int' object is not iterable

By the way, I also tried imantics, but without example, it’s not that easy:

import imantics
imantics.Mask.polygons(mask1)

fails yielding:

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

<ipython-input-28-b97153601dd2> in <module>()
----> 1 imantics.Mask.polygons(mask1)

/usr/local/lib/python3.6/dist-packages/imantics/annotation.py in polygons(self)
    798         :rtype: :class:`Polygons`
    799         """
--> 800         if not self._c_polygons:
    801 
    802             # Generate polygons from mask

AttributeError: 'numpy.ndarray' object has no attribute '_c_polygons'

ptrblck · March 8, 2020, 1:49am

Try to pass the desired shape to PIL.Image.resize as a tuple instead of a single int to get rid of the first error.
I’m not familiar with imantics

dip4fish · August 7, 2020, 11:55am

hello,
Sorry for ultradelayed answer. The discusion is updated. Here I tried to generate a legit coco dataset using only pycocotools.