Segmentation of images with multichannel masks

887574002 · November 10, 2020, 4:03pm

Hi, I am working on Segmentation of Camvid dataset. Masks are of shape [h,w,3]. Indeed for each class instead of the label of a unique class we have a color_code. I found a code for segmentation of this dataset which in order to use CrossEntropyLoss, a snippet was used that convert each mask from shape [h,w,3] to [32,h,w], where 32 is the number of classes. There is also another snippet that converts a mask of shape [32,h,w] to a 3-channel mask of shape [h,w,3]. Now, when I finished training and visualize predicted mask, the result is not good at all. I am exploring where I did a mistake. First I checked the two snippet for converting a RGB mask to a Binary mask. The codes are below.

def Color_map(df):
    
    '''
    Returns the reversed String.
    Parameters:
        dataframe: A Dataframe with rgb values with class maps.
    Returns:
        code2id: A dictionary with color as keys and class id as values.   
        id2code: A dictionary with class id as keys and color as values.
        name2id: A dictionary with class name as keys and class id as values.
        id2name: A dictionary with class id as keys and class name as values.
    '''
    cls = pd.read_csv(df)
    
    # thic line of code tuples the code of colors
    # len(cls.name) is the number of classes that we have: 32
    # output: [(64, 128, 64), (192, 0, 128)]
    color_code = [tuple(cls.drop("name",axis=1).loc[idx]) for idx in range(len(cls.name))]
    
    # it gives a number to each code
    # assigns color codes to id numbers
    code2id = {v: k for k, v in enumerate(list(color_code))}
    
    # it assigns numbers(classes) to codes
    id2code = {k: v for k, v in enumerate(list(color_code))}
    
    # it collects name of each class 
    color_name = [cls['name'][idx] for idx in range(len(cls.name))]
    
    # it gives to each class a number
    name2id = {v: k for k, v in enumerate(list(color_name))}
    
    # it gives
    id2name = {k: v for k, v in enumerate(list(color_name))}  
    
    return(code2id, id2code, name2id, id2name)

def mask_to_rgb(mask, id2code):
    ''' 
        Converts a Binary Mask of shape: [batch_size,num_classes,h,w] 
        to RGB image mask of shape [batch_size, h, w, color_code]
        
        Parameters:
            img: A Binary mask
            color_map: Dictionary representing color mappings
        returns:
            out: A RGB mask of shape [batch_size, h, w, color_code]
    '''
    ## Since our mask is one-hot encoding
    ## the argmax returns the output class for each pixel
    ## It returns the label of each pixel that is a number in range : 0-31
    ## dim 0 :batch_size
    
    single_layer = np.argmax(mask, axis=1)
    
    ## it converts each mask to [batch_size, h,w, color_code]
    output = np.zeros((mask.shape[0],mask.shape[2],mask.shape[3],3))
    
    for k in id2code.keys():
        
        output[single_layer==k] = id2code[k]
        
    return(output.astype(np.float32))

def rgb_to_mask(img, id2code):
    ''' 
        Converts a RGB image mask of shape [batch_size,h, w, color_code], to a mask of shape
        [batch_size,n_classes,h,w]
        
        Parameters:
            img: A RGB img mask
            color_map: Dictionary representing color mappings: ecah class assigns to a unique color code
        returns:
            out: A Binary Mask of shape [batch_size, classes, h, w]
    '''
    
    # num_classes is equal to len(mask) 
    num_classes = len(id2code)
    
    # it makes a tensor of shape h,w,num_classes:(720,960,num_classes)
    shape = img.shape[:2]+(num_classes,)
    
    # it makes a tensor with given shape and with type float64
    out = np.zeros(shape, dtype=np.float64)
    
    # 
    for i, cls in enumerate(id2code):
        
        #print(f'i: {i}, cls: {cls}')
        
        # img.reshape((-1,3)) flats mask except in channels
        
        # it reads thecolor code for a multiplication of higght and width and if it is one of the color code of 
        # the classes that we have then the third dimension takes the label of that class and the first
        # two dimsnions return to the hight and width  
        out[:,:,i] = np.all(np.array(img).reshape((-1,3)) == id2code[i], axis=1).reshape(shape[:2])
        
        # out: hight, width, class
        # returns class, hight, width
    return(out.transpose(2,0,1))

I expect for a mask in training set, when I convert it to a binary mask using function rgb_to_mask and then convert it again to an rgb-mask using function mask_to_rgb the result would be the same as the original mask. But they are not the same as it can be seen in the following code. I do not know where the problem is.

print(f'mask_sample_shape: {mask_sample.shape} mask_sample: {mask_sample.dtype}, mask_type: {type(mask_sample)}')
_, id2code,_,_ = Color_map(os.path.join(path,'class_dict.csv'))
mask_cls = rgb_to_mask(mask_sample, id2code)
print(f'mask_cls: {mask_cls.shape} mask_cls:{mask_cls.dtype} mask_cls_type: {type(mask_cls)}')
## Now converting mask_cls to mask_rgb
mask_rgb = mask_to_rgb(mask_cls[np.newaxis,...], id2code)
mask_rgb = mask_rgb.squeeze(0)
print(f'mask_rgb_shape: {mask_rgb.shape} mask_rgb: {mask_rgb.dtype}, mask_rgb_type: {type(mask_rgb)}')
comparison = mask_rgb==mask_sample
print(comparison.all())

out:
mask_sample_shape: (720, 960, 3) mask_sample: float32, mask_type: <class 'numpy.ndarray'>
mask_cls: (32, 720, 960) mask_cls:float32 mask_cls_type: <class 'numpy.ndarray'>
mask_rgb_shape: (720, 960, 3) mask_rgb: float32, mask_rgb_type: <class 'numpy.ndarray'>
False

Dose anyone have any idea where is the problem? I also visualized both masks at the end but the result was not the same.

Thanks