Can anyone explain the out put below to me? The categories individually do not seem to perform well but, how is the final avg/total figure high?
How is this calculated?
Precision – True Positive / Total Positive
Recall – True Positive / (True positive + False Negative)
F1- score – is the harmonic mean of precision and recall
Support – The support is the number of samples of the true response that lie in that class.
In other words precision tells you how good is your model at rejecting the wrong labels and recall tells you how good is your model at assigning correct labels to the true classes. To understand this better read this
It seems the last and best performing class is the majority class. The average for the current metric was apparently calculated using the micro-average, i.e. by aggregating the contributions of all classes to compute the average metric.
E.g. if you have two (imbalanced) classes and would like to calculate the sensitivity (TPR), you could just add all samples (TP, P) for both classes and calculate the metric. For the macro-average you would calculate the metric independently for both classes and then take the mean: