Multi label Classification Evaluation Problem

You could create a confusion matrix and calculate the per-class metrics as described in this post with an example. Would that work for you?