The prediction results do not match the computed evaluation metrics in binary segmentation

                                      Dear researchers,

My task consists of performing binary segmentation. I have managed to build up a VNet like network for seismic data segmentation. During the training process, the analysis of the behavior of the loss values could show that the model converged. I have tested the trained model on a separate dataset and I obtained a dice value, a recall value and precision value all equal to 1; which I found very weird. It is noteworthy that I have used the metric functions from Medpy package to compute the evaluation measures. After displaying the prediction results, the whole image is black. Please what might be the problem or what could be wrong with my model. Any suggestions and comments would be highly appreciated.

Thank you for your time

Could you post the evaluation code here as well as the shapes of your model output and target?
I guess the eval method might expect another format and thus the metric calculation might be wrong.

Thank you very much for your time and my apologies for the late reply. The evaluation function is quite long.

import h5py
import math
import nibabel as nib
import numpy as np
from medpy import metric
import torch
import torch.nn.functional as F
from tqdm import tqdm


def test_all_case(net, image_list, num_classes, patch_size=(112, 112, 80), stride_xy=18, stride_z=4, save_result=True, test_save_path=None, preproc_fn=None):
    dice = []
    asd = []
    precison= []
    recall= []
        
    for image_path in tqdm(image_list):
        id = image_path.split('/')[-1]
        h5f = h5py.File(image_path, 'r')
        image = h5f['image'][:]
        label = h5f['label'][:]
        if preproc_fn is not None:
            image = preproc_fn(image)
            label = preproc_fn(label)
            
        prediction, score_map = test_single_case(net, image, stride_xy, stride_z, patch_size, num_classes=num_classes)
        print(prediction.shape)
        print(label.shape)
        dice.append(metric.binary.dc(prediction, label))
        asd.append(metric.binary.asd(prediction, label))
        precison.append(metric.binary.precision(prediction, label))
        recall.append(metric.binary.recall(prediction, label))
            
    print("dice: {:.5f}:".format(np.mean(dice)))  
    print("asd: {:.5f}:".format(np.mean(asd))) 
    print("precison: {:.5f}:".format(np.mean(precison))) 
    print("recall: {:.5f}:".format(np.mean(recall)))          

def test_single_case(net, image, stride_xy, stride_z, patch_size, num_classes=1):
    w, h, d = image.shape

    # if the size of image is less than patch_size, then padding it
    add_pad = False
    if w < patch_size[0]:
        w_pad = patch_size[0]-w
        add_pad = True
    else:
        w_pad = 0
    if h < patch_size[1]:
        h_pad = patch_size[1]-h
        add_pad = True
    else:
        h_pad = 0
    if d < patch_size[2]:
        d_pad = patch_size[2]-d
        add_pad = True
    else:
        d_pad = 0
    wl_pad, wr_pad = w_pad//2,w_pad-w_pad//2
    hl_pad, hr_pad = h_pad//2,h_pad-h_pad//2
    dl_pad, dr_pad = d_pad//2,d_pad-d_pad//2
    if add_pad:
        image = np.pad(image, [(wl_pad,wr_pad),(hl_pad,hr_pad), (dl_pad, dr_pad)], mode='constant', constant_values=0)
    ww,hh,dd = image.shape

    sx = math.ceil((ww - patch_size[0]) / stride_xy) + 1
    sy = math.ceil((hh - patch_size[1]) / stride_xy) + 1
    sz = math.ceil((dd - patch_size[2]) / stride_z) + 1
    print("{}, {}, {}".format(sx, sy, sz))
    score_map = np.zeros((num_classes, ) + image.shape).astype(np.float32)
    cnt = np.zeros(image.shape).astype(np.float32)

    for x in range(0, sx):
        xs = min(stride_xy*x, ww-patch_size[0])
        for y in range(0, sy):
            ys = min(stride_xy * y,hh-patch_size[1])
            for z in range(0, sz):
                zs = min(stride_z * z, dd-patch_size[2])
                test_patch = image[xs:xs+patch_size[0], ys:ys+patch_size[1], zs:zs+patch_size[2]]
                test_patch = np.expand_dims(np.expand_dims(test_patch,axis=0),axis=0).astype(np.float32)
                test_patch = torch.from_numpy(test_patch).cuda()
                y = net(test_patch)
                y = y.cpu().data.numpy()
               # y=y.cuda().data.numpy()
                y = y[0,:,:,:,:]
                score_map[:, xs:xs+patch_size[0], ys:ys+patch_size[1], zs:zs+patch_size[2]] \
                  = score_map[:, xs:xs+patch_size[0], ys:ys+patch_size[1], zs:zs+patch_size[2]] + y
                cnt[xs:xs+patch_size[0], ys:ys+patch_size[1], zs:zs+patch_size[2]] \
                  = cnt[xs:xs+patch_size[0], ys:ys+patch_size[1], zs:zs+patch_size[2]] + 1
    score_map = score_map/np.expand_dims(cnt,axis=0)
    label_map = score_map[0, :, :, :] > 0.5
    if add_pad:
        label_map = label_map[wl_pad:wl_pad+w,hl_pad:hl_pad+h,dl_pad:dl_pad+d]
        score_map = score_map[:,wl_pad:wl_pad+w,hl_pad:hl_pad+h,dl_pad:dl_pad+d]
    return label_map, score_map

The shapes of the prediction and label are both (256, 256, 256). In effect, I trained the model with patches of size (128,128,128).

Thank your your time, guidance and suggestions

Based on the code I assume you are using sigmoid as your last non-linearity?
How are you displaying the prediction result?

I’m trying to figure out, if the metric calculation might be wrong (a perfect score is always suspicious) or if the visualization just displays the prediction in a wrong way.

Yes sir, I am using sigmoid as my last non-linearity. I have been trying to restart everything from scratch so as to see whether I could find where I did wrong. I am still on it.

Really appreciate your help and time