Integrated gradients with captum and handmade transformer model

Thanks for the update.
The issue seems to be created by detaching the input tensor by transforming it to a LongTensor before passing it to the embedding layer:

emb = nn.Embedding(10, 10)
x = torch.randint(0, 10, (1,)).float().requires_grad_()

out = emb(x.long())
torch.autograd.grad(out, x, grad_outputs=torch.ones_like(out))

You won’t be able to calculate the gradients in x since x.long() is detached already.