NLP in Pytorch Tutorial


Hi, I have been working on a tutorial as a fast introduction to deep learning NLP with Pytorch. I feel that the current tutorials focus mostly on CV. There are some NLP examples out there, but I didn’t find anything for beginners (which I am looking for, since we are using Pytorch for an NLP class I am TA’ing). So I wrote a tutorial. It assumes NLP knowledge and familiarity with neural nets, but not with deep learning programming.

I wanted to post the tutorial here to get feedback and also because I figure it may be helpful to some people. There are some fast explanations and a lot of code, with a few working examples (nothing state of the art, just things to get an idea). I still need to add a BiLSTM-CRF tagger for NER example, which will be the most complicated one. Here’s the link.

If you look at it, I’m happy to get any feedback. I want it to be useful to the students in my class.

N-gram vs CBOW in the tutorial

Also checkout which has some NLP tutorials.

(Adam Paszke) #3

Hey, nice work and thanks for sharing!

I have some minor suggestions:

  1. make_bow_vector (cell evaluated as 91) - create vec using torch.zeros(len(word_to_idx))
  2. I’d mention that NLLLoss expects log-probabilities, but you could also use CrossEntropyLoss if you removed the log_softmax.
  3. I’d split the log_probs line in cell 101 into a few more. It’s not very readable with that indentation.

Also, I’d also recommend using tensor indexing to create BoW vectors, as that will likely be faster than iterating over a list in tensor constructor.


Hi, Thanks for the comments! I will update it when I get the chance.

(Deniz Saner) #5

@rguthrie3, thanks for the amazing tutorial!

By any chance, did you write down a solution for the pretrained embeddings exercise?

(Zhu Tao) #6

I got an implementation of CBOW here. Please try to finish it yourself until check others!

(Pengkai Zhu) #7

H zhutaoi, it is very nice for you to post your implementation. But I have some questions about it.

In your code, you defined your CBOW model as same as the author’s NGramLanguageModeler and change the number of context size during training. I believe this can work but I don’t think it matches the definition of the CBOW model, which is (A*sum(q) + b). I think you can throw the context number away and add your context together before feed in the linear layer.

I am not in the area of NLP so if I misunderstood the model or made a mistake, please point it out and I am happy to discuss with you.


(Ehsan M Kermani) #8

I think it should be along this line:

class CBOW(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super(CBOW, self).__init__()
        self.embedding = nn.Embedding(num_embeddings=vocab_size,
        self.linear = nn.Linear(in_features=embedding_dim,
    def forward(self, x):
        # embeds 4 context words into say, 10 dim,
        # then take their sum along the rows (dim=0) to get 1 by 10 vector
        embedding = self.embedding(x).sum(dim=0)  
        out = self.linear(embedding)
        out = F.log_softmax(out)
        return out

But I’m getting RuntimeError: index out of range at /py/conda-bld/pytorch_1493674854206/work/torch/lib/TH/generic/THTensorMath.c:273 and I don’t know why?

(Pengkai Zhu) #9

Because of the way you generate the word_to_ix dict. In the codes author provided, he generated the dict as:

word_to_ix = {word: i for i, word in enumerate(raw_text)}

Note here he enumerate over raw_text but not vocab. I guess that is why you get an index out of range error. You can either change the raw_text to vocab, or set the vocab_size to be the length of raw_text. Hope this can address your issue.

(Ehsan M Kermani) #10

Ah, right! of course, it makes more sense to enumerate vocab for later embedding.

(Perikumar Javia) #11

Thank you very much this is really good for starters. However, as I am new to PyTorch I am looking for any tutorial that can handle sparse operations as I am dealing with one hot vectors. Please guide if you know any such tutorials.


(Sanket Kumar Singh) #12

I am new to pytorch and learning NLP/deep learning.
I was going through the CBOW model mentioned here and the explanation mentioned on tutorial page/exercise (here) . In the former, two matrices are learned while in the later we only learn the embeddings of the words, A and B parameters. I think both are saying the same things but I couldn’t understand how.

I implemented the exercise of CBOW (my code is below). Please let me know if it looks okay.

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torch.autograd as autograd

raw_text = “”“We are about to study the idea of a computational process.
Computational processes are abstract beings that inhabit computers.
As they evolve, processes manipulate other abstract things called data.
The evolution of a process is directed by a pattern of rules
called a program. People create programs to direct processes. In effect,
we conjure the spirits of the computer with our spells.”"".split()

vocab = set(raw_text)

word_to_idx = {word: i for i, word in enumerate(vocab)}
idx_to_word = {i: word for i, word in enumerate(vocab)}
context_target = [ ([raw_text[i-2], raw_text[i-1], raw_text[i+1] , raw_text[i+2]], raw_text[i]) for i in range(2, len(raw_text)-2)]

class CBOWClassifier(nn.Module):

def __init__ (self, vocab_size, embed_size, context_size):
	self.embeddings = nn.Embedding(vocab_size, embed_size)
	self.linear1 = nn.Linear(embed_size, 128)
	self.linear2 = nn.Linear(128, vocab_size)

def forward(self, inputs):
	embed = self.embeddings(inputs)
	embed = torch.sum(embed, dim=0)
	out = self.linear1(embed)
	out = F.relu(out)
	out = self.linear2(out)
	log_probs = F.log_softmax(out)
	return log_probs

VOCAB_SIZE = len(word_to_idx)

losses = []
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001)

for epochs in range(100):
total_loss = torch.Tensor([0])
for context, target in context_target:

	context_idx = [word_to_idx[w] for w in context]
	context_var = autograd.Variable(torch.LongTensor(context_idx))
	log_probs = model(context_var)
	target_idx = word_to_idx[target]
	loss = loss_function(log_probs, autograd.Variable(torch.LongTensor([target_idx])))
	total_loss = total_loss +


(Emir Ceyani) #13


I also implemented the CBOW model as follows:

Loss is decresing but how much epoch is needed to get the output for the CBOW exercise?

(Yuqli) #14

I’ve been reading this tutorial and would like to ask why use this line:
hello_embed = embeds(autograd.Variable(lookup_tensor))

instead of
hello_embed = embeds(Variable(lookup_tensor))

In other words, why wrap the Variable around with an autograd?

Because type(hello_embed) for the two lines produce the same result. (<class ‘torch.autograd.variable.Variable’>_


(Andy Markman) #15

@rguthrie3 Hi, I saw you don’t have a language model example…I am working on a clean implementation of a language model for Word level LM…I guess you might know a bit about this question: Why is Hidden Variable out of Network Class in Pytorch examples Language Model? . Please take a look! Thanks for the tutorial btw. They are usually very helpful :slight_smile: