[RESOLVED] What's the behavior of optimizer.step() when embedding layer is sparse?

Hi everyone,
I’m using torch.nn.Embedding with sparse=True`. However, I find it seems that when I using optimizer.step(), the optimizer will update all rows in Embedding layer, instead of updating only the used rows. That is, if the Embedding layer is 10,000100 and I only use the 10th row, it will update the whole 10,000100 matrix, instead of only the 10th row, whose size is only 1*100.

I don’t know if my understanding is correct. So I want to ask what’s the behavior when the sparse param of Embedding layer is set to True.

Thank you all!

==================SOLUTION===================

Hi guys, I already solve this question with the help from @fmassa.

The problem is caused by the using of momentum of SGD, which will result in the dense update. The params weight_decay will also cause this problem. If I use the simplest version of SGD, the updating speed will be much faster.

Are you using weight_decay? If yes, then it would explain the dense updates.

I didn’t set any value to weight_decay. According to the doc, I think in this case the value of weight_decay should be zero.

It depends on the optimizer as well. Are you using SGD?

Yes, I do.

However, It does relate to the optimizer. I use the momentum, which will result in the dense updates.

Thanks for your suggestion!