How to calculate binary segmentation metrics?

I am working on a binary segmentation problem using Pytorch. I want to know the correct way of calculating metrics like precision, recall, f1-score, mIOU, etc for my test set. From many of the online codes available, I found different ways of calculation, which I have mentioned below:

Method 1
  • Calculate metrics for individual images separately.
  • Compute the sum of each metric of all test images.
  • Divide the metrics by the number of test images.

Example:

for i, (x, y) in enumerate(zip(test_x, test_y)):
           ...
           Score += calc_metrics(mask1, pred1)
         # Score could be any of the metrics 
Final_Score = Score/ len(test_x)

Method 2
  • Add TP, FP, TN, and FN pixel count of all the images and prepare a confusion matrix.
  • Calculate all metrics at the end using the total TP, FP, TN, and FN pixel count and confusion matrix.

Example:

for i, (x, y) in enumerate(zip(test_x, test_y)):
         ...
        FP += np.float(np.sum((pred1==1) & (mask1==0)))
        FN += np.float(np.sum((pred1==0) & (mask1==1)))
        TP += np.float(np.sum((pred1==1) & (mask1==1)))
        TN += np.float(np.sum((pred1==0) & (mask1==0)))
Score = calc_metrics(TP, FP, TN, TP)
Method 3

Calculate batch-wise metrics and divide them by the number of test images at the end.

Method 4

Unlike all the above methods, which use 0.5 as the threshold on the predicted mask (after applying sigmoid), this method uses a range of thresholds and computes metrics on different thresholds for each metric and takes the mean of these values at the end.
Example:

for i, (x, y) in enumerate(zip(test_x, test_y)):
           ...
	for t in range(len(threshold)):
        thresh = threshold(t)
		thresh_metric(t) = calc_metrics(mask1, pred1, thresh)	
    thresh_final(i,:) = thresh_metric
Score = np.sum(mean(thresh_final))/len(test_x)

I am confused about which way to use to report my model’s results on the test set.