Determining the precision, recall and f-measure of binary segmentation?

I’ve trained a model that tries to segment objects of interest in an image. I want evaluate the performance of my model by determining the precision, recall and f-measure of the network output.

I know that precision=tp/(tp+fp) and recall=tp/(tp+fn). I found out that sklearn.metrics has multiple functions to determine these values but I cant seem to effectively use these functions.

At first I had written a script that found the tp, fp and fn values and returned the actual int numbers so for example in one sample I have 2 objects of interest in the ground truth but the network detected 4 objects of interest so these put my values at:

tp = 2, fp = 2 and fn = 0

I cant use these, as sklearn.metrics expects arrays as inputs, so I rewrote my script to give the following arrays:

gt_array = [1, 1]
pred_array = [1, 0, 1, 0]

as two of the predictions were correct while the other two were misclassified objects.

If I give this to the function:

metrics.classification_report(gt_array, pred_array)


metrics.precision_recall_fscore_support(gt_array, pred_array,average='binary')

I get the error:

{ValueError}Multi-label binary indicator input with different numbers of labels

which I can see, but I assumed the function will automatically, from the size differences, determine the fn and fp values.

So how do I determine these metrics when I have varying sizes of arrays. In some samples the ground_truth has 10 detections but the network output only shows 5, so in this case how do I incorporate the fn values in the above function?