LSTM text generator repeats same words over and over

(Julian1070) #1

Hey!

I built an LSTM for character-level text generation with Pytorch. The model trains well (loss decreases reasonably etc.) but the trained model ends up outputting the last handful of words of the input repeated over and over again. For instance:

I have played around with the hyperparameters a bit, and the problem persists. I’m currently using:

  • Loss function: BCE

  • Optimizer: Adam

  • Learning rate: 0.001

  • Sequence length: 64

  • Batch size: 32

  • Embedding dim: 128

  • Hidden dim: 512

  • LSTM layers: 2

I also tried not always choosing the top choice, but this only introduces incorrect words and doesn’t break the loop. I’ve been looking at countless tutorials, and I can’t quite figure out what I’m doing differently/wrong.

The following is the code for training the model. training_data is one long string and I’m looping over it predicting the next character for each substring of length SEQ_LEN. I’m not sure if my mistake is here or elsewhere but any comment or direction is highly appreciated!

loss_dict = dict()
for e in range(EPOCHS):
    print("------ EPOCH {} OF {} ------".format(e+1, EPOCHS))
    
    lstm.reset_cell()
    
    for i in range(0, DATA_LEN, BATCH_SIZE):
        
        if i % 50000 == 0:
            print(i/float(DATA_LEN))
        
        optimizer.zero_grad()
        
        input_vector = torch.tensor([[
            vocab.get(char, len(vocab)) 
            for char in training_data[i+b:i+b+SEQ_LEN]
        ] for b in range(BATCH_SIZE)])
        
        if USE_CUDA and torch.cuda.is_available():
            input_vector = input_vector.cuda()
        
        output_vector = lstm(input_vector)        
        
        target_vector = torch.zeros(output_vector.shape)
        
        if USE_CUDA and torch.cuda.is_available():
            target_vector = target_vector.cuda()
        
        for b in range(BATCH_SIZE):
            target_vector[b][vocab.get(training_data[i+b+SEQ_LEN])] = 1
        
        error = loss(output_vector, target_vector)
        
        error.backward()
        optimizer.step()
        
        loss_dict[(e, int(i/BATCH_SIZE))] = error.detach().item()
(Sebastian Raschka) #2

I also tried not always choosing the top choice, but this only introduces incorrect words and doesn’t break the loop. I

Hm, if you want to generate different texts, you should sample randomly (regarding the probability) – is that what you were doing when you said that you are not always choosing the top choice?

Also, you probably want to reset the hidden state after each generated text.

Note a great one, but I have some simple CharRNN here in case that’s useful: https://github.com/rasbt/stat479-deep-learning-ss19/blob/master/L14_intro-rnn/code/char_rnn.ipynb

(Julian1070) #3

Thank you! I’m looking through your notebook and will make some changes to my approach (e.g. sample text randomly). I think it’ll be helpful.

What I did regarding top choices is that I defined a threshold (e.g. 0.8 times the max), and sampled randomly between all characters that exceeded the threshold. But since the network just ends up repeating the last words of the input, I think it trains to a point where the gap between 1st choice and 2nd choice is super large…

I have a feeling my problem may have to do with the way I’m feeding the network the input and target during training or something? I’m having a hard time wrapping my head around what, conceptually, can lead to this repetition of the last words of the input…?

(Sebastian Raschka) #4

I think it trains to a point where the gap between 1st choice and 2nd choice is super large…

Good point, I think this can easily happen.

I have a feeling my problem may have to do with the way I’m feeding the network the input and target during training or something?

Yeah, your issue may already occur during training (as opposed to “inference”). Are you drawing random chunks from your training set? If not, you probably want to do that. It looks like in

    input_vector = torch.tensor([[
        vocab.get(char, len(vocab)) 
        for char in training_data[i+b:i+b+SEQ_LEN]
    ] for b in range(BATCH_SIZE)])

that the network always gets the same inputs during training? For debugging, maybe try to print out the text chunks you are feeding the network to make sure they are not repeating.

(Julian1070) #5

Okay, it was actually a stupid mistake I made in producing the characters with the trained model: I got confused with the batch size and assumed that at each step the network would predict an entire batch of new characters when in fact it only predicts a single one… Yikes!

Anyways, thanks for your advice and I’ll see if I can use it to fine tune the results a bit!

(Sebastian Raschka) #6

glad to hear that you found the issue. And yeah, I find working with text data pretty complicated (compared to image data) – very easy to sneak in some coding errors regarding the sampling of the text.

1 Like
(Mehdi Seifi) #7

Hi
Can you elaborate please? I have the same problem.
I’m trying to give my model a sentence and then append the model predicted word to end of the sentence, then slide the window and feed back the sentence (from second word to the previously predicted word) to the model and so on… but all i got is a repeated word.

Here is how I prepare data batches:

# get words
words = data.split()
# remove repeated words to get vocab:
vocab = set(words)
print(f'\nvocab size: {len(vocab)}')
# make word dict
word2idx = {w:idx for idx, w in enumerate(vocab)}
idx2word = {idx:w for w, idx in word2idx.items()}

# get data bacthes
sequence_len = 30
batch_size = 50
word_size = 20
num_epochs = 2

data_batches = []
for i in range(len(words) - sequence_len - 1):  # 1 for target word
	data = [word2idx[x] for x in words[i: i + sequence_len]]
	target = word2idx[words[i + sequence_len]]

	data_batches.append([data, target])

num_batchs = np.ceil(len(data_batches) / batch_size).astype(np.int)
random.shuffle(data_batches)
print(f'\nnumber of batches: {num_batchs}')

This is my model:

class MyNet(nn.Module):
	def __init__(self, vocab_size, word_size, sequence_len, batch_size, hidden_dim):
		super().__init__()

		self.seq_len = sequence_len
		self.batch_size = batch_size
		self.hidden_dim = hidden_dim
		self.num_layers = 2
		self.h_0 = None
		self.c_0 = None

		self.encoder = nn.Embedding(vocab_size, word_size)
		self.lstm = nn.LSTM(input_size=word_size, hidden_size=hidden_dim,
							num_layers=self.num_layers, batch_first=True, dropout=0)
		self.linear = nn.Linear(hidden_dim, vocab_size)

	def reset_hidden(self):
		# For initialising/resetting hidden state.
		self.h_0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_dim)
		self.c_0 = torch.zeros(self.num_layers, self.batch_size, self.hidden_dim)

	def forward(self, x):
		encoded = self.encoder(x)
		# print(f'encoded: {encoded.shape}')
		lstm_out, (self.h_0, self.c_0) = self.lstm(encoded)
		# print(f'{lstm_out.shape}, {lstm_out[:, -1, :].shape}')
		# Only take the output from the final timestep
		y_pred = self.linear(lstm_out[:, -1, :])
		
		return y_pred

And this is training process:

# begin training
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = MyNet(len(vocab), word_size, sequence_len, batch_size, 50).to(device)
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.5)


for e in range(num_epochs):
	print(f'\n\nepoch #{e}:\n')
	model.reset_hidden()
	
	for i in range(num_batchs):
		batch = data_batches[i * batch_size: (i+1) * batch_size]
		x = torch.tensor([b[0] for b in batch], device=device)
		y = torch.tensor([b[1] for b in batch], device=device)
		# print(x.shape, y.shape)

		y_pred = model(x)

		loss = criterion(y_pred, y)

		optimizer.zero_grad()
		loss.backward()
		optimizer.step()

		if i % 50 == 0:
			print(f'\tbatch #{i}:\tloss={loss.item():.10f}')

torch.save(model.state_dict(), './model.pth')

Finally this is how I test my model:

# testing model
# choosing a random seqence
sentence = data_batches[np.random.randint(1, 500)][0]
print('\n', [idx2word[idx] for idx in sentence])
model.eval()

# trying to predict next 10 words
for i in range(10):
	# model.reset_hidden()
	x = torch.tensor(sentence[i : sequence_len +  i], device=device)
	out = model(x.view(1, *x.size()))
	out = torch.argmax(out)

	word = idx2word[out.detach().item()]
	print(word)

	sentence.append(out.detach().item())