Hello everyone, I have trained a multi-class classification model with pre-trained Albert LM. I am now experimenting with Captum to see the attribution score of each class/label. I expect to see positive/max attribution score for model prediction class/label, and negative and smaller scores for the rest of classes/labels. However, I don’t really see such a correlation there. Not sure if I don’t use Captum correctly or my expectation is simply not correct. Anyone has experience with this? Thanks.
Hi Kevin, thank you for the question! Do you have any example code snippets that you could post here ?
Jupyter or google colab notebooks would be great too.