How to properly calculate BinarySpecifity

Hi, I am using the BinarySpecifity to get the specifity of my binary classification problem, and I am getting very bad specifity, but after check what’s happening, I can see that in batches where there’s no negative targets, the specifity is 0, for example:

from torchmetrics.classification import BinarySpecificity
import torch

metric = BinarySpecificity()

output = torch.tensor([0.67, 0.78, 0.65, 1. ])
target = torch.tensor([1, 1, 1, 1])

print(metric(output, target))

The above code prints tensor(0.), so, this is affecting negatively to the calculation of the specifity on my model.

As the documentation of BinarySpecifity:

How can I proceed when the TP + FP = 0? Should I manually put specifity at 1 (or just skip these cases), when there’s no negaive values in the targets?

Why do t you accumulate all outputs of the entire epoch and compute the metric once?

Well, that´s very clever, in fact I programmed my train and test loop based on what I´ve seen in the docs and examples where they compute metrics in each batch loop.

But reading again and thinking about what´s stated in the docs, the phrase “The metric is only proper defined…” I think that asks my question. If in that situation the metric wouldn’t give any reasonable result because it’s not defined to give result in this case, I think I should just skip these results.

But for sure I will consider what you proposed for future implementations.

What would happen if all batches contain a single label only by pure luck or if shuffling is disabled? Are you skipping all matrix calculations then? Are you also skipping the entire epoch reporting?

No, I’m not skipping entire epoch reporting, right now I am accumulating in each batch loop all the results for this epoch.

For example, in each epoch I have 20 batches, so I am accumulating that 20 results, then I do the mean for this epoch and I report this metric as, for example, average accuracy for epoch 1.

I think that in that case that all batches doesn’t contain a single ‘0’ as target, the list where I am accumulating would be empty and probably will crash. But the other option, where I ever calculate the specifity it’s not good either, because in that case specifity for this batch will be 0, and that’s incorrect.

You would have to if the dataset is not shuffled (and the class samples fit perfectly into batches without creating a mixed distribution). And even if it is, you would randomly skip batches resulting in a skewed epoch metric.

Yep, I’m shuffling the training dataset.

I’m looking at the code I’ve developed, i’m writting some pseudocode to explain the flow. Please, note that I skipped things like the use of tqdm to make it more clear.

for epoch in range(epochs):
  model.train()
  for inputs, metadata, targets in loader:
    optimizer.zero_grad()

    outputs = model(images, metadata)

    loss = criterion(outputs, targets.float())
    loss.backward()

    optimizer.step()

    # Metrics things
    (tn, fp), (fn, tp) = conf_matrix(outputs, targets)
    
    if (tp + fp) != 0:
      specifity_list.append(specifity_metric(outputs, targets).item())

... some more code

As you can see, I am only calculating the specifity for this batch iteration if the conditions stated in the docs are met, then I am accumulating the specifity for each batch iter and at the final of the epoch then I am doing something like

sum(specifity_list) / len(specifity_list)

To get this epoch specifity value.