Integrated Gradients and Text Generation

Hi all,

I am currently trying to explain the generated Text of a GPT-2 model, using Integrated Gradients and captum. I can create a minimal example if helpful.

My question is: Is it even at all possible to explain text generation? My current approach is to explain each token individually, by viewing the logits of the output layer as a classification problem and varying the context the model receives. As the target, I assigned the token index that is generated without pertubation. However, the attributions to the input are all zero. Do I make a conceptual mistake here? Is there a resource somewhere for explaining text generation? I couldn’t seem to find anything…