gradient for cls_head is calculated, but gradient for embedder is None. Why is the gradient of embedder None and how can I calculate the gradient w.r.t. emb without ending in this problem?

Is there aby other way to get the gradient with respect to emb?
I need to get the gradient to both, emb and the parametere within the embedder (for some perturbation).

I donâ€™t quite understand the question. In your current code you are implicitly detaching the computation graph by recreating the tensor. If you remove this line of code, it should work without any other changes.

and according to this emb should have its grad attribute populated, but as pointed out by @ptrblck you explicitly detach emb from the graph and hence the model (embedder) parametersâ€™ grad shanâ€™t be populated.

My goal is to transfer â€śfast gradient sign method (FGSM)â€ť to nlp. FGSM calculates the gradient w.r.t. input images and later uses that in an adversial update step.

Calculating the gradient w.r.t. a sequence of (int-valued) token sequences doesnâ€™t work, so my idea was to instead calculate the gradient w.r.t. the token embeddings, so that the token embeddings can later on be perturbated for the adversial uodate step.

Thatâ€™s why i need to calculate the gradients to both, the embedder weights (for updating them) and the embeddings (for the perturbation). Is there any way to achieve this?

Thatâ€™s not the case unless you already detached the computation graph before or disabled the gradient calculation through any context manager such as with torch.no_grad().
This code works properly:

emb is an intermediate forward activation and doesnâ€™t have a weight attribute.
If you want to check the gradient from this activation, call .retain_grad() on it: