Sparse softmax over a vocabulary

Hi,

Suppose my vocabulary size is 10,000. At some point, my model emits scores for 3 words that are, let’s say, at vocabulary index 22, 1576, and 9065 respectively. Variable ‘scores’ has a dimension of (1, 3). How can I obtain a log_softmax over the vocabulary size that can be used for NLLLoss?

I tried something like the following, but it seems that the gradient is not back propagating

word_scores = model.foo()
index_to_update = model.bar()
i = torch.LongTensor(index_to_update)
v = word_scores
word_attn_energy = torch.sparse.FloatTensor(i.t(), v, torch.Size([1, OUTPUT_DIM])).to_dense()
word_attn_energy.requires_grad = True
log_prob = F.log_softmax(word_attn_energy, dim=1)

In short, I am looking for something similar to TensorFlow’s tf.nn.sparse_softmax_cross_entropy_with_logits.