Bug of nn.Embedding when `sparse=True` and `padding_idx` is set


(Tzu Ray Su) #1

Hi everyone, I think there might be a bug in the gradients of nn.Embedding when sparse=True and padding_idx is set. Below are some code snippets that could reproduce this.

import torch.nn as nn
from torch.autograd import Variable

# below is the same code provided in the documentation
# http://pytorch.org/docs/master/nn.html#embedding
input = Variable(torch.LongTensor([[0,2,0,5]]))
embedding = nn.Embedding(10, 3, padding_idx=0)                                                                                
model = nn.Sequential(embedding)
opt = torch.optim.SGD(model.parameters(), 0.01)                                                                                            
opt.zero_grad()                                                                                                                            
loss = torch.sum(model(input))                                                                                                             
loss.backward()
opt.step()                                                                                                                            
print(embedding.weight.data)

The first script should print something like this. After opt.step() the first row is still zero, as mentioned in this reply that the embedding gradient of the padding index is ignored.

0.0000  0.0000  0.0000
-1.0657 -1.0059 -1.4740
0.5380 -0.5131  0.1291
0.0899 -1.4056  0.0625
0.1345 -1.0449 -1.5367
0.9558  2.8128 -2.5808
0.9454  0.0503 -2.6308
-1.5984 -0.4989  0.0800
1.7455 -1.4634 -1.4889
-1.0654  0.2526  1.0377
[torch.FloatTensor of size 10x3]

However when you set sparse=True, after update the first row will become nonzero. Could this be a bug? ( my pytorch version is 0.2.0_3. )

import torch
import torch.nn as nn
from torch.autograd import Variable

# the row of `padding_idx` becomes non-zero after update when `sparse=True`
input = Variable(torch.LongTensor([[0,2,0,5]]))
embedding = nn.Embedding(10, 3, padding_idx=0, sparse=True)                                                                                
model = nn.Sequential(embedding)
opt = torch.optim.SGD(model.parameters(), 0.01)                                                                                            
opt.zero_grad()                                                                                                                            
loss = torch.sum(model(input))                                                                                                             
loss.backward()             
opt.step()                                                                                                                
print(embedding.weight.data)

#2

this is indeed a bug. I’ve opened and issue, we’ll get this fixed https://github.com/pytorch/pytorch/issues/3506