F1 Score for Multi-label Classification

Hi,

I am trying to calculate F1 score (and accuracy) for my multi-label classification problem. Could you please provide feedback on my method, if I’m calculating it correctly. Note that I’m calculating IOU (intersection over union) when model predicts an object as 1, and mark it as TP only if IOU is greater than or equal to 0.5.

GT labels: 14 x 10 x 128
Output: 14 x 10 x 128

where 14 is the batch_size, 10 is the sequence_length, and 128 is the object vector (i.e., 1 if the object at an index belongs to the sequence and 0 otherwise).

for epoch in range(10):
    TP = FP = TN = FN = EPOCH_PRECISION = EPOCH_RECALL = EPOCH_F1 = 0.
    for inputs, gt_labels in tr_dl:
        out = model(inputs) # out shape: (14, 10, 128)

        # loop through batch samples
        for batch_idx, batch_output in enumerate(out):
            
            # loop through sequence elements
            # batch_output shape: (10, 128)
            for seq_idx, seq_output in enumerate(batch_output):
                # pred_labels shape: (128,)
                pred_labels = (torch.sigmoid(seq_output) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
                
                # loop through predicted objects for a sequence element    
                for object_idx, pred_label in enumerate(pred_labels):
                    if pred_label == 1 and gt_labels[batch_idx, seq_idx, object_idx] == 1:
                        # calculate IOU (overlap of prediction and gt bounding box)
                        iou = 0.78 # assume we get this iou value for objects at idx
                        if iou >= 0.5:
                            TP += 1
                        else:
                            FP += 1
                    elif pred_label == 1 and gt_labels[batch_idx, seq_idx, object_idx] == 0:
                        FP += 1
                    elif pred_label == 0 and gt_labels[batch_idx, seq_idx, object_idx] == 1:
                        FN += 1
                    else:
                        TN += 1
          
    EPOCH_ACC = (TP + TN) / (TP + TN + FP + FN)
     
    if TP + FP > 0:
        EPOCH_PRECISION = TP / (TP + FP)
    if TP + FN > 0:
        EPOCH_RECALL = TP / (TP + FN)

    EPOCH_F1 = (2 * EPOCH_PRECISION * EPOCH_RECALL) / (EPOCH_PRECISION + EPOCH_RECALL)

Hi,

I looked at your code snippet and I believe that there might be some confusion. I have two comments for you:

  1. The implementation is really inefficient. You can compute the statistics without looping over the dimensions. All the operations you need are sum, element-wise multiplication and negation. I leave it here as a tip, if you feel like you need help, I can write some code for you.
  2. The IOU is supposed to be computed over higher dimensional data, this means starting from 1D to ND tensors. Since you are computing it point wise, it loses the meaning of IOU, and it actually falls back to the simpler statistics that you have correctly indicated in your control flow.

In short, compute all the statistics using tensor operations and you are good :slight_smile:

Thank you for your response. So basically, I haven’t provided any details about how I’m computing IOU but I’m doing it over the bounding boxes (8 corners) for GT and predicted objects.

And the reason for inefficiency (i.e., looping over dimensions) is because I’m dealing with sequences within the batch and not all sequences are relevant. For example, the first element in the batch (14 x 10 x 128) might only have 3 x 128, while the the rest 7 x 128 is just padding. I do have a mask tensor that tells me about the “relevant” sequence elements.

Could you please provide an example of doing this efficiently and how I can compute IOU and then go about computing accuracy and F1? Essentially, I’m looking to calculate F1@0.5IOU (similar to average precision at IOU).

I was able to make my implementation much more efficient. Am I doing it correctly now?

def calculate_performance_metrics(total_padded_elements, gt_labels, predicted_labels):
        # check if TP pred objects overlap with TP gt objects
        TP_INDICES = (torch.logical_and(predicted_labels == 1, gt_labels == 1)).nonzero() # we only want the batch and object indices, i.e. the 0 and 2 indices
        TP = calculate_tp_with_iou() # details of this don't matter for now
        FP = torch.sum(torch.logical_and(predicted_labels == 1, gt_labels == 0)).item()
        TN = torch.sum(torch.logical_and(predicted_labels == 0, gt_labels == 0)).item()
        FN = torch.sum(torch.logical_and(predicted_labels == 0, gt_labels == 1)).item()
        return float(TP), float(FP), float(TN - total_padded_elements), float(FN)


for epoch in range(10):
    TP = FP = TN = FN = EPOCH_PRECISION = EPOCH_RECALL = EPOCH_F1 = 0.
    for inputs, gt_labels, masks in tr_dl:
        outputs = model(inputs) # out shape: (14, 10, 128)

        # mask shape: (14, 10). So need to expand it to the shape of output
        masks = masks[:, :, None].expand_as(outputs)
        
        pred_labels = (torch.sigmoid(outputs) >= 0.5).float().type(torch.int64) # consider all predictions above 0.5 as 1, rest 0
        pred_labels = pred_labels * masks
        gt_labels = (gt_labels * masks).type(torch.int64)
        total_padded_elements = masks.numel() - masks.sum() # need this to get accurate true negatives

        batch_tp, batch_fp, batch_tn, batch_fn = calculate_performance_metrics(gt_labels, pred_labels, total_padded_elements)
        EPOCH_TP += batch_tp
        EPOCH_FP += batch_fp
        EPOCH_TN += batch_tn
        EPOCH_FN += batch_fn

        EPOCH_ACCURACY = (EPOCH_TP + EPOCH_TN) / (EPOCH_TP + EPOCH_TN + EPOCH_FP + EPOCH_FN)
        
        if EPOCH_TP + EPOCH_FP > 0:
            EPOCH_PRECISION = EPOCH_TP / (EPOCH_TP + EPOCH_FP)

        if EPOCH_TP + EPOCH_FN > 0:
            EPOCH_RECALL = EPOCH_TP / (EPOCH_TP + EPOCH_FN)

        EPOCH_F1 = (2 * EPOCH_PRECISION * EPOCH_RECALL) / (EPOCH_PRECISION + EPOCH_RECALL)

@ptrblck @ParGG

Much better :slight_smile:

Although I think you are still leaving some performance on the table. You don’t need to perform the comparisons in the logical_and (you already have 0s and 1s in the tensors), in general comparisons (from what I have seen during profiling) are expensive. Instead you can negate the values by performing 1 - gt_labels or 1 - predicted_labels. Also, you can exploit the fact that TP + FN = GT_P, FP + TN = GT_N, TP + FP = PRED_P and FN + TN = PRED_N to avoid some computations.

1 Like