Internal operation of nn.Embedding

Hi, I’m following on the PyTorch tutorial given by the official pytorch site
and I’m on the step towards building network of N-gram language model.
And then I started to wonder several stuffs which I might have glossed over without cares
like, nn.Embedding, loss function and stuff.
So I have two questions to ask.

  1. I believe word embeddings are to be trained but fixed, but I get the exact same value for the word embedding as the example with my followed code. How does the nn.Embedding work?
    I searched the official document on the site and it says

    weight (Tensor) – the learnable weights of the module of shape (num_embeddings, embedding_dim)

so, does that mean whenever I have my datasets and word embedding dimension, it will calculate different embedding everytime I run the program?

  1. How does the loss function works?

    def forward(self, inputs) :
    embeds = self.embeddings(inputs).view(1, -1)
    out = F.relu(self.linear1(embeds))
    #out = F.relu(self.linear2(out))
    out = self.linear3(out)
    log_probs = F.log_softmax(out)
    return log_probs


    loss_function = nn.NLLLoss() # Negative log-Likelihood Loss
    log_probs = model(context_var)
    loss = loss_function(log_probs, Variable(torch.LongTensor([word_to_ix[target]])))
    total_loss +=

Above are my source code. Does the loss_function catches the index of the target and sets the probability as 1 and adjust the parameters of all related weights of the loss Variable holds in a way that the negative log-likelihood loss could most likely to be minimized?

The NLLLoss docs are pretty sweet.

Looking at the source code I can tell you that weights is randomly initialized, not from any distribution.

1 Like