# F1 Score for Multi-label Classification

Hi,

I am trying to calculate F1 score (and accuracy) for my multi-label classification problem. Could you please provide feedback on my method, if I’m calculating it correctly. Note that I’m calculating `IOU` (intersection over union) when model predicts an object as `1`, and mark it as `TP` only if `IOU` is greater than or equal to `0.5`.

GT labels: `14 x 10 x 128`
Output: `14 x 10 x 128`

where `14` is the `batch_size`, `10` is the `sequence_length`, and `128` is the object vector (i.e., `1` if the object at an index belongs to the sequence and `0` otherwise).

``````for epoch in range(10):
TP = FP = TN = FN = EPOCH_PRECISION = EPOCH_RECALL = EPOCH_F1 = 0.
for inputs, gt_labels in tr_dl:
out = model(inputs) # out shape: (14, 10, 128)

# loop through batch samples
for batch_idx, batch_output in enumerate(out):

# loop through sequence elements
# batch_output shape: (10, 128)
for seq_idx, seq_output in enumerate(batch_output):
# pred_labels shape: (128,)
pred_labels = (torch.sigmoid(seq_output) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0

# loop through predicted objects for a sequence element
for object_idx, pred_label in enumerate(pred_labels):
if pred_label == 1 and gt_labels[batch_idx, seq_idx, object_idx] == 1:
# calculate IOU (overlap of prediction and gt bounding box)
iou = 0.78 # assume we get this iou value for objects at idx
if iou >= 0.5:
TP += 1
else:
FP += 1
elif pred_label == 1 and gt_labels[batch_idx, seq_idx, object_idx] == 0:
FP += 1
elif pred_label == 0 and gt_labels[batch_idx, seq_idx, object_idx] == 1:
FN += 1
else:
TN += 1

EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)

if TP + FP > 0:
EPOCH_PRECISION = TP / (TP + FP)
if TP + FN > 0:
EPOCH_RECALL = TP / (TP + FN)

EPOCH_F1 = (2 * EPOCH_PRECISION * EPOCH_RECALL) / (EPOCH_PRECISION + EPOCH_RECALL)

``````

Hi,

I looked at your code snippet and I believe that there might be some confusion. I have two comments for you:

1. The implementation is really inefficient. You can compute the statistics without looping over the dimensions. All the operations you need are sum, element-wise multiplication and negation. I leave it here as a tip, if you feel like you need help, I can write some code for you.
2. The IOU is supposed to be computed over higher dimensional data, this means starting from 1D to ND tensors. Since you are computing it point wise, it loses the meaning of IOU, and it actually falls back to the simpler statistics that you have correctly indicated in your control flow.

In short, compute all the statistics using tensor operations and you are good Thank you for your response. So basically, I haven’t provided any details about how I’m computing IOU but I’m doing it over the bounding boxes (8 corners) for GT and predicted objects.

And the reason for inefficiency (i.e., looping over dimensions) is because I’m dealing with sequences within the batch and not all sequences are relevant. For example, the first element in the batch (`14 x 10 x 128`) might only have `3 x 128`, while the the rest `7 x 128` is just padding. I do have a mask tensor that tells me about the “relevant” sequence elements.

Could you please provide an example of doing this efficiently and how I can compute IOU and then go about computing accuracy and F1? Essentially, I’m looking to calculate `F1@0.5IOU` (similar to average precision at IOU).

I was able to make my implementation much more efficient. Am I doing it correctly now?

``````def calculate_performance_metrics(total_padded_elements, gt_labels, predicted_labels):
# check if TP pred objects overlap with TP gt objects
TP_INDICES = (torch.logical_and(predicted_labels == 1, gt_labels == 1)).nonzero() # we only want the batch and object indices, i.e. the 0 and 2 indices
TP = calculate_tp_with_iou() # details of this don't matter for now
FP = torch.sum(torch.logical_and(predicted_labels == 1, gt_labels == 0)).item()
TN = torch.sum(torch.logical_and(predicted_labels == 0, gt_labels == 0)).item()
FN = torch.sum(torch.logical_and(predicted_labels == 0, gt_labels == 1)).item()
return float(TP), float(FP), float(TN - total_padded_elements), float(FN)

for epoch in range(10):
TP = FP = TN = FN = EPOCH_PRECISION = EPOCH_RECALL = EPOCH_F1 = 0.
for inputs, gt_labels, masks in tr_dl:
outputs = model(inputs) # out shape: (14, 10, 128)

# mask shape: (14, 10). So need to expand it to the shape of output

pred_labels = (torch.sigmoid(outputs) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0

batch_tp, batch_fp, batch_tn, batch_fn = calculate_performance_metrics(gt_labels, pred_labels, total_padded_elements)
EPOCH_TP += batch_tp
EPOCH_FP += batch_fp
EPOCH_TN += batch_tn
EPOCH_FN += batch_fn

EPOCH_ACCURACY = (EPOCH_TP + EPOCH_TN) / (EPOCH_TP + EPOCH_TN + EPOCH_FP + EPOCH_FN)

if EPOCH_TP + EPOCH_FP > 0:
EPOCH_PRECISION = EPOCH_TP / (EPOCH_TP + EPOCH_FP)

if EPOCH_TP + EPOCH_FN > 0:
EPOCH_RECALL = EPOCH_TP / (EPOCH_TP + EPOCH_FN)

EPOCH_F1 = (2 * EPOCH_PRECISION * EPOCH_RECALL) / (EPOCH_PRECISION + EPOCH_RECALL)

``````

Much better Although I think you are still leaving some performance on the table. You don’t need to perform the comparisons in the `logical_and` (you already have 0s and 1s in the tensors), in general comparisons (from what I have seen during profiling) are expensive. Instead you can negate the values by performing `1 - gt_labels` or `1 - predicted_labels`. Also, you can exploit the fact that `TP + FN = GT_P`, `FP + TN = GT_N`, `TP + FP = PRED_P` and `FN + TN = PRED_N` to avoid some computations.

1 Like