When should I choose to set sparse=True
for an Embedding
layer? What are the pros and cons of the sparse and dense versions of the module?
4 Likes
When most of the embeddings are not learnt during training, that is representation of only few words is updated, rest of the representations stay as they were, for example,
class A(nn.Module):
def __init__(self):
super().__init__()
self.embedding = nn.Embedding(10, 10, sparse=True)
def forward(self, x):
return 2*self.embedding(x)
net = A()
loss = net(torch.LongTensor([8, 7])).sum()
for param in net.parameters():
print(param.grad)
None
loss.backward()
for param in net.parameters():
print(param.grad)
gives
tensor(indices=tensor([[8, 7]]),
values=tensor([[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.]]),
size=(10, 10), nnz=2, layout=torch.sparse_coo)
not all word embeddings that we have represented are updated during training, only a few are updated, if we do not use sparse=True, then,
loss.backward()
for param in net.parameters():
print(param.grad)
would give something like,
tensor([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[2., 2., 2., 2., 2., 2., 2., 2., 2., 2.],
[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
most of the embeddings are not being updated during training, so probably it is better to use sparse=True, if we were passing all of our inputs to our neural network, and all of the embeddings were getting updated, then we would have set sparse=False.
2 Likes