Optimizers for `nn.Embedding`

sding · November 30, 2017, 8:10pm

Hi All,

I came across this line in the source code of torch.nn.modules.sparse:

Notes:
        Keep in mind that only a limited number of optimizers support
        sparse gradients: currently it's `optim.SGD` (`cuda` and `cpu`),
        and `optim.Adagrad` (`cpu`)

But I’ve been using optim.Adam and optim.Adadelta with nn.Embedding for a while without my experiments crashing, and hence seeing this line makes me very confused.

Is this line essentially saying that I should only use either optim.SGD or optim.Adagrad whenever I have nn.Embedding in my module? Or the note in the source code is no longer true?

Thanks!

Shuoyang

111100 · May 28, 2018, 1:12pm

All right, the answer is there is a parameter in nn.Embeddings: sparse=False. Only when the ‘sparse’ is False, the module will be sparse.