Fastest way to calculate matthews correlation coefficient

M2M · December 9, 2021, 3:02pm

Hello everyone,

I am working with semantic segmentation where I end up comparing very large tensors of labels against the predictions (each containing more than 5.6 million pixels). At the moment, I just convert the labels and predictions into python lists and use sklearn to calculate Matthews Correlation Coefficient but this is super slow. I would really appreciate any support to speed up this. Pytorch is fast running epochs but when I try to use sklearn, it gets super slow.

The portion of my code

y_true = []
y_pred = []

with torch.no_grad():
 for data in data_loader:
  images, labels = data
  images = images.to(device)
  y_true = y_true + torch.flatten(labels).tolist() # Tensor is still in CPU to avoid copying it to CPU list y_true
  labels = labels.to(device)

  outputs = net(images)
  _, predicted = torch.max(outputs.data, 1)

 predicted = predicted.to('cpu')
 y_pred = y_pred + torch.flatten(predicted).tolist()

mcc = matthews_corrcoef(y_true, y_pred) # The line that makes the whole code slow

Any help regarding this will be highly appreciated.