Calculate per-pixel level ROCAUC: unsupervised segmentation

I’m trying to evaluate the performance of an unsupervised detection model based on the list of masks and the list of scores:

fpr, tpr, _ = roc_curve(mask_list, y_score)
per_pixel_rocauc = roc_auc_score(mask_list, y_score)
total_pixel_roc_auc.append(per_pixel_rocauc)
print('%s pixel ROCAUC: %.3f' % ('ROC', per_pixel_rocauc))
fig_pixel_rocauc.plot(fpr, tpr, label='%s ROCAUC: %.3f' % ('ROC', per_pixel_rocauc))

but I git the below errors:

ValueError: unknown format is not supported

How can I solve that?

This error is most likely raised by one of the used scikit-learn methods, so you could check the types of all inputs and make sure to pass the expected inputs to these methods.
Unfortunately, I don’t have a guess which method might raise the issue.

This is my dataset class:

class Dataset(Dataset):

    def __init__(self, dataset_path='./data/CT',  is_train=True, resize=64, cropsize=64):
        
        self.dataset_path = dataset_path
        self.is_train = is_train
        self.resize = resize
        self.cropsize = cropsize

        
        self.x, self.y, self.mask = self.load_dataset_folder()

        
        self.transform_x = T.Compose([T.Resize(resize),
                                      T.CenterCrop(cropsize),         
                                      T.Grayscale(channel),
                                      T.ToTensor(),
                                      T.Normalize(mean=[0.5, 0.5, 0.5],
                                                  std=[0.5, 0.5, 0.5])])
        
        self.transform_mask = T.Compose([T.Resize(resize),
                                         T.CenterCrop(cropsize),
                                         T.Grayscale(channel),
                                         T.ToTensor()])
        


    #Function to obtain the     
    def __getitem__(self, idx):
        x, y, mask = self.x[idx], self.y[idx], self.mask[idx]

        x = Image.open(x)
        x = self.transform_x(x)

        if y == 0:
            mask = torch.zeros([1, self.cropsize, self.cropsize])

        else:
            mask = Image.open(mask)
            mask = self.transform_mask(mask)

        return x, y, mask
        


    def __len__(self):
        return len(self.x)


    def load_dataset_folder(self):
        phase = 'train' if self.is_train else 'test'
        
        #lists of images, labels and masks 
        x, y, mask = [], [], []

        img_dir = os.path.join(self.dataset_path, phase)
        gt_dir = os.path.join(self.dataset_path, 'ground_truth')

        img_types = sorted(os.listdir(img_dir))
        
        for img_type in img_types:
            
            img_type_dir = os.path.join(img_dir,  img_type)     
            if not os.path.isdir(img_type_dir):
                continue

            img_fpath_list = sorted([os.path.join(img_type_dir, f) for f in os.listdir(img_type_dir) ])

            x.extend(img_fpath_list)
                # load gt labels

            if img_type == 'normal':
                y.extend([0] * len(img_fpath_list))
                mask.extend([None] * len(img_fpath_list))
            
            else:
                y.extend([1] * len(img_fpath_list))
                
                gt_type_dir = os.path.join(gt_dir, img_type)
                #img_fname_list = [os.path.splitext(os.path.basename(f))[0] for f in img_fpath_list]
                
                gt_fpath_list = sorted([os.path.join(gt_type_dir, d) for d in os.listdir(gt_type_dir) ])

                mask.extend(gt_fpath_list)             
       
     #   assert len(x) == len(y), 'number of x and y should be same'
        return list(x), list(y), list(mask)

How can I solve the problem?

I don’t know as your code snippet is not executable so I cannot debug it.
Were you able to check the input types used in these methods?

I am using the below code and I got an image and the corresponding mask

test_img, test_lb, test_mask = next(iter(train_dataloader))

plt.figure()
# testdata
plt.subplot(1, 2, 1)
plt.imshow(test_img[0][0].cpu().detach().numpy()*255)
plt.title('Real Image')

plt.subplot(1, 2, 2)
plt.imshow(test_mask[0][0].cpu().detach().numpy().astype(np.uint8)*255)
plt.title('Mask')
    
plt.savefig('./Image_Mask.jpg')

plt.show()

How can I also check them ?

I’m not sure how the new code snippet is related to the initial question.
In the first post you’ve said that this code snippet is raising the issue:

fpr, tpr, _ = roc_curve(mask_list, y_score)
per_pixel_rocauc = roc_auc_score(mask_list, y_score)
total_pixel_roc_auc.append(per_pixel_rocauc)
print('%s pixel ROCAUC: %.3f' % ('ROC', per_pixel_rocauc))
fig_pixel_rocauc.plot(fpr, tpr, label='%s ROCAUC: %.3f' % ('ROC', per_pixel_rocauc))

which is most likely calling into scikit-learn methods.
To isolate the issue, check all inputs to these methods, e.g. via:

print(type(mask_list))
print(type(y_score))
fpr, tpr, _ = roc_curve(mask_list, y_score)

if roc_curve raises the error. If not, i.e. if the error is raised in another line of code, check the inputs to this particular method and then check the scikit docs and make sure the inputs have the right type, shape etc.

print(type(mask_list))
print(type(y_score))

<class 'list'>
<class 'list'>

fpr, tpr, _ = roc_curve(gt_mask_list, score_list)

/usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return array(a, dtype, copy=False, order=order)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-b78d2675ad4d> in <module>()
----> 1 fpr, tpr, _ = roc_curve(gt_mask_list, score_list)

1 frames
/usr/local/lib/python3.7/dist-packages/sklearn/metrics/_ranking.py in _binary_clf_curve(y_true, y_score, pos_label, sample_weight)
    534     if not (y_type == "binary" or
    535             (y_type == "multiclass" and pos_label is not None)):
--> 536         raise ValueError("{0} format is not supported".format(y_type))
    537 
    538     check_consistent_length(y_true, y_score, sample_weight)

ValueError: unknown format is not supported