Internal operation of nn.Embedding

hongtaesuk · January 17, 2018, 2:19am

Hi, I’m following on the PyTorch tutorial given by the official pytorch site
and I’m on the step towards building network of N-gram language model.
And then I started to wonder several stuffs which I might have glossed over without cares
like, nn.Embedding, loss function and stuff.
So I have two questions to ask.

I believe word embeddings are to be trained but fixed, but I get the exact same value for the word embedding as the example with my followed code. How does the nn.Embedding work?
I searched the official document on the site and it says

Variables:
weight (Tensor) – the learnable weights of the module of shape (num_embeddings, embedding_dim)

so, does that mean whenever I have my datasets and word embedding dimension, it will calculate different embedding everytime I run the program?

How does the loss function works?

def forward(self, inputs) :
embeds = self.embeddings(inputs).view(1, -1)
out = F.relu(self.linear1(embeds))
#out = F.relu(self.linear2(out))
out = self.linear3(out)
log_probs = F.log_softmax(out)
return log_probs

usage

loss_function = nn.NLLLoss() # Negative log-Likelihood Loss
log_probs = model(context_var)
loss = loss_function(log_probs, Variable(torch.LongTensor([word_to_ix[target]])))
loss.backward()
optimizer.step()
total_loss += loss.data

Above are my source code. Does the loss_function catches the index of the target and sets the probability as 1 and adjust the parameters of all related weights of the loss Variable holds in a way that the negative log-likelihood loss could most likely to be minimized?

richard · January 17, 2018, 7:04pm

The NLLLoss docs are pretty sweet.

Looking at the source code I can tell you that weights is randomly initialized, not from any distribution.

Internal operation of nn.Embedding

usage