How to update different paras on every iteration?

I am trying to implement Word2Vec on dataset wiki8.

For every word I initialize a vector with 200dim. And I have 200,000+ words here, which means I have to update a large number of parameters every iteration.

I implement negtive sample, which means I just use some of the words to compute, say 10 words.

When doing backward, 10 words are used to compute but other 199,9990 words’ requires_grad are True, so 10 words has grad which is non-zero, 199,9990 has zero grad.

But when doing “optimizer.step()”, it’s very slow. Is it because 199,9990 words still update their parameters (which is useless: w = w + 0)

Here are my questions:

How do I tell the optimizer to update useful parameter?
How do I avoid the useless update?


If you are using nn.Embedding, then wouldn’t just enabling sparse = True do the trick for you?

Example here:

No, I didn’t use it, I just initialize a look up matrix myself because I have other requirements, so maybe I cannot use sparse=True.