Retrieving embedding gradients


I’m implementing a model for a binary NLP classification task with a bi-RNN and an Attention mechanism on top and I would like to get the gradients of the embeddings with respect to the predominant predicted class. My issue is I found various approaches to obtain the gradient and they yield various results. The approaches I tried are:

  1. torch.autograd.grad( loss, model.embedding.weight ) and retrieve element 0.

  2. To add retain_grad() on the embeddings and retrieve the gradient after calling loss.backward() by doing embedding.grad

  3. To backward through the output by y_predicted.backward( (y_predicted > 0.5 ).float() ) where the (y_predicted > 0.5) will return a mask of equal size to y_predicted with 1’s occupying the predominant class. Following this backward I then retrieve the gradient with embedding.grad.

Can anyone advise on which of these three (if any) is the correct approach? If not is there a more robust way to obtain the embedding gradients starting from the predominant class?