What you give as input to an embedding layer is the index of the element you want to embed and it returns the embedding for this element.
This operation is differentiable wrt the content of the embedding but not the index, thus you cannot give it an index for which you want the gradients (in your code snipet, you set requires_grad=True for a).
If you are just looking for a Linear transformation of the input tensor a, you want to use the nn.Linear layer not nn.Embedding
If you want to achieve that, I think you will need to set the gradients corresponding to these embeddings to 0 manually after performing the backward pass.
I am having exactly the same problem. I do understand why the problem occurs (surely it makes no sense to compute the gradient w.r.t. the indices).
But I do not understand how to adjust the code so that it computes the gradients w.r.t. the content values of the embedding vectors.
When an autograd function receives an input with requires_grad=True, that mean that it will need to compute the gradients for this input.
If the function will be unable to compute these gradients (Embedding wrt the indices in this case), it will raise an error if you pass indices with requires_grad=True.
To fix this error, you just need to pass indices with requires_grad=False either by computing them using only Variables that do not require gradients or if you learn your indices in another way, use new_ind = ind.detach(). By doing so, new_ind will have requires_grad=False and no gradients will be propagated.
The Embedding layer contains internally the embedding vectors (called weights) created here. Parameter can be considered here as a simple Variable with requires_grad=True. So the gradients wrt the embedding vectors will always be computed.
Indeed, you can check that even if the input (indices) of your embedding layer has requires_grad=False, the output will have requires_grad=True (because gradients are needed for the embedding vectors.