I am trying to calculate F1 score (and accuracy) for my multi-label classification problem. Could you please provide feedback on my method, if I’m calculating it correctly. Note that I’m calculating IOU (intersection over union) when model predicts an object as 1, and mark it as TP only if IOU is greater than or equal to 0.5.

GT labels: 14 x 10 x 128
Output: 14 x 10 x 128

where 14 is the batch_size, 10 is the sequence_length, and 128 is the object vector (i.e., 1 if the object at an index belongs to the sequence and 0 otherwise).

for epoch in range(10):
TP = FP = TN = FN = EPOCH_PRECISION = EPOCH_RECALL = EPOCH_F1 = 0.
for inputs, gt_labels in tr_dl:
out = model(inputs) # out shape: (14, 10, 128)
# loop through batch samples
for batch_idx, batch_output in enumerate(out):
# loop through sequence elements
# batch_output shape: (10, 128)
for seq_idx, seq_output in enumerate(batch_output):
# pred_labels shape: (128,)
pred_labels = (torch.sigmoid(seq_output) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
# loop through predicted objects for a sequence element
for object_idx, pred_label in enumerate(pred_labels):
if pred_label == 1 and gt_labels[batch_idx, seq_idx, object_idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif pred_label == 1 and gt_labels[batch_idx, seq_idx, object_idx] == 0:
FP += 1
elif pred_label == 0 and gt_labels[batch_idx, seq_idx, object_idx] == 1:
FN += 1
else:
TN += 1
EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)
if TP + FP > 0:
EPOCH_PRECISION = TP / (TP + FP)
if TP + FN > 0:
EPOCH_RECALL = TP / (TP + FN)
EPOCH_F1 = (2 * EPOCH_PRECISION * EPOCH_RECALL) / (EPOCH_PRECISION + EPOCH_RECALL)

I looked at your code snippet and I believe that there might be some confusion. I have two comments for you:

The implementation is really inefficient. You can compute the statistics without looping over the dimensions. All the operations you need are sum, element-wise multiplication and negation. I leave it here as a tip, if you feel like you need help, I can write some code for you.

The IOU is supposed to be computed over higher dimensional data, this means starting from 1D to ND tensors. Since you are computing it point wise, it loses the meaning of IOU, and it actually falls back to the simpler statistics that you have correctly indicated in your control flow.

In short, compute all the statistics using tensor operations and you are good

Thank you for your response. So basically, I haven’t provided any details about how I’m computing IOU but I’m doing it over the bounding boxes (8 corners) for GT and predicted objects.

And the reason for inefficiency (i.e., looping over dimensions) is because I’m dealing with sequences within the batch and not all sequences are relevant. For example, the first element in the batch (14 x 10 x 128) might only have 3 x 128, while the the rest 7 x 128 is just padding. I do have a mask tensor that tells me about the “relevant” sequence elements.

Could you please provide an example of doing this efficiently and how I can compute IOU and then go about computing accuracy and F1? Essentially, I’m looking to calculate F1@0.5IOU (similar to average precision at IOU).

Although I think you are still leaving some performance on the table. You don’t need to perform the comparisons in the logical_and (you already have 0s and 1s in the tensors), in general comparisons (from what I have seen during profiling) are expensive. Instead you can negate the values by performing 1 - gt_labels or 1 - predicted_labels. Also, you can exploit the fact that TP + FN = GT_P, FP + TN = GT_N, TP + FP = PRED_P and FN + TN = PRED_N to avoid some computations.