Hi everyone,
I’m using torch.nn.Embedding
with sparse=True`. However, I find it seems that when I using optimizer.step(), the optimizer will update all rows in Embedding layer, instead of updating only the used rows. That is, if the Embedding layer is 10,000100 and I only use the 10th row, it will update the whole 10,000100 matrix, instead of only the 10th row, whose size is only 1*100.
I don’t know if my understanding is correct. So I want to ask what’s the behavior when the sparse param of Embedding layer is set to True.
Thank you all!
==================SOLUTION===================
Hi guys, I already solve this question with the help from @fmassa.
The problem is caused by the using of momentum
of SGD, which will result in the dense update. The params weight_decay will also cause this problem. If I use the simplest version of SGD, the updating speed will be much faster.