Hi @all,
I have a more general question concerning the evaluation metrics for segmentation. Which ones are commonly used or do you see as ‘best practice’ for researchers? Which one would you include in publications?
I am talking more about comparability than efficiency. I am aware that the use of metrics can be task dependent. Some applications will best work with very specialized or custom methods. I am also aware that the selection of metrics can be predefined by cirumstances (e.g.Kaggle competition). But if I am new to (semantic) segmentation, which metrics should I implement in my code to evaluate the outcome of my segmentation model?
Here is a non-comprehensive list of metrics that I’ve found (and partially used) so far. I did not always include ‘statistical variations’ like mean/average, median, standard deviation, etc… For a deeper look into the topic see: Taha, Abdel Aziz, and Allan Hanbury. “Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool.” BMC medical imaging 15.1 (2015): 29.
Metrics that seem to be commonly used (e.g. [1] [2])
Based on confusion matrix
- Global/Per_Class Accuracy
- Precision
- Recall
- Intersection over Unit(IOU) = Jaccard Index
- F1 score = Sørensen–Dice coefficient (Dice)
Metrics that I rarely encounter
Based on confusion matrix
- Cohen’s kappa / Observed Accuracy
- Global Consistency Error (GCE) / Local Consistency Error(LCE)
Base on pair counting
- Rand Index / Adjusted Rand Index
Based on spatial distance
- Hausdorff Distance and weighted Hausdorff Distance
There are lots of more metrics out there. But did I miss some major ones? Which of these do you suggest to use for starters? And which ones do you use?