Classification report output

Can anyone explain the out put below to me? The categories individually do not seem to perform well but, how is the final avg/total figure high?
How is this calculated?

Please help.

         precision    recall  f1-score   support

 B-LOC
  0.761     0.801     0.780      2587
B-MISC
  0.646     0.726     0.684      1191
 B-ORG
  0.462     0.498     0.479      1754
 B-PER
  0.585     0.676     0.627      1506
 I-LOC
  0.807     0.677     0.736       622
I-MISC
  0.356     0.319     0.337       276
 I-ORG
  0.716     0.469     0.567      1958
 I-PER
  0.688     0.685     0.687       957
     O
  0.975     0.977     0.976     84141

 avg / total 
 0.938     0.938     0.937     94992

Precision – True Positive / Total Positive
Recall – True Positive / (True positive + False Negative)
F1- score – is the harmonic mean of precision and recall
Support – The support is the number of samples of the true response that lie in that class.

In other words precision tells you how good is your model at rejecting the wrong labels and recall tells you how good is your model at assigning correct labels to the true classes. To understand this better read this

1 Like

It seems the last and best performing class is the majority class. The average for the current metric was apparently calculated using the micro-average, i.e. by aggregating the contributions of all classes to compute the average metric.
E.g. if you have two (imbalanced) classes and would like to calculate the sensitivity (TPR), you could just add all samples (TP, P) for both classes and calculate the metric. For the macro-average you would calculate the metric independently for both classes and then take the mean:

Class0 - TPR: 9999/10000=0.9999
Class1 - TPR: 0/1=0.0
micro-average TPR: (9999+0)/(10000+1)=0.9998
macro-average TPR: (0.9999+0.0)/2=0.49995
1 Like

Thank you for such a lucid explanation! :slight_smile: