After changing weight of the nn.Embedding, the values at the padding_index also changed

leej35 · April 27, 2017, 8:19pm

Hi,

After I created nn.Embedding object and change the weights manually, the values at the padding index also changed and the result of the embedding is not what I wanted. Don’t you think that in the mechanism of nn.Embedding, even though the initialization value is changed manually, the values at the padding should output 0s?

Following is the example code I tried:

embed = nn.Embedding(5,10, padding_idx=4)
initrange = 0.5
embed.weight.data.uniform_(-initrange, initrange)
print embed(Variable(torch.LongTensor([4])))
print embed(Variable(torch.LongTensor([4])))

Variable containing:
-0.2718 -0.1496  0.2677 -0.3810 -0.3220 -0.4013  0.2528  0.0429  0.1287 -0.3817
[torch.FloatTensor of size 1x10]

smth · April 29, 2017, 2:50pm

padding_idx is just a specific index in the weight matrix. So there is no mechanism of separation.

After you change the weights, you have to reset the index of padding_idx to zeros, i.e.:

embed.weight.data[4] = 0

leej35 · May 2, 2017, 2:58pm

Oh, I see. Thank you so much.

shagunsodhani · February 25, 2018, 4:18pm

Isnt padding_idx special in the sense that we do not apply gradients to it? Or is the embedding corresponding to the padding index also updated when we use an optimizer?

smth · February 27, 2018, 12:24am

padding_idx is ignored in computing backward gradients.

Here’s the code location reflecting that: https://github.com/pytorch/pytorch/blob/1848cad10802db9fa0aa066d9de195958120d863/aten/src/ATen/native/Embedding.cpp#L117

shagunsodhani · February 28, 2018, 12:13am

Thanks for the pointer. I had the confusion as I was using SparseEmbeddings with Pytorch 0.3.0. There the embedding corresponding to padding index was also updated. The bug was fixed in 0.3.1. Adding it here in case someone else stumbles across it https://github.com/pytorch/pytorch/issues/3506

Even_Oldridge · June 13, 2018, 4:21am

Is there an easy way to do this in a general way? I’m doing seq2seq where the inputs are a set of variables rather than a single value. I’ve got a special start of sequence set value that I’d like to use to trigger operations within the forward, but I’m hoping that I can somehow prevent the weights from being updated.

The approach I’m planning for now is to make the sequence all 0, with the target also 0, and hope that the net chooses the easy path of just connecting input to output, but ideally i’d like to just not update the weights at all.

I’m also curious about the implementation here within the embedding layer. Is there corresponding code preventing the rest of the net from updating? Or does padding index just prevent the embedding from updating?

Thanks in advance!