Hi All,
I came across this line in the source code of torch.nn.modules.sparse
:
Notes:
Keep in mind that only a limited number of optimizers support
sparse gradients: currently it's `optim.SGD` (`cuda` and `cpu`),
and `optim.Adagrad` (`cpu`)
But I’ve been using optim.Adam and optim.Adadelta with nn.Embedding for a while without my experiments crashing, and hence seeing this line makes me very confused.
Is this line essentially saying that I should only use either optim.SGD
or optim.Adagrad
whenever I have nn.Embedding
in my module? Or the note in the source code is no longer true?
Thanks!
Shuoyang