Hey!
I built an LSTM for character-level text generation with Pytorch. The model trains well (loss decreases reasonably etc.) but the trained model ends up outputting the last handful of words of the input repeated over and over again. For instance:
I have played around with the hyperparameters a bit, and the problem persists. I’m currently using:
-
Loss function: BCE
-
Optimizer: Adam
-
Learning rate: 0.001
-
Sequence length: 64
-
Batch size: 32
-
Embedding dim: 128
-
Hidden dim: 512
-
LSTM layers: 2
I also tried not always choosing the top choice, but this only introduces incorrect words and doesn’t break the loop. I’ve been looking at countless tutorials, and I can’t quite figure out what I’m doing differently/wrong.
The following is the code for training the model. training_data
is one long string and I’m looping over it predicting the next character for each substring of length SEQ_LEN
. I’m not sure if my mistake is here or elsewhere but any comment or direction is highly appreciated!
loss_dict = dict()
for e in range(EPOCHS):
print("------ EPOCH {} OF {} ------".format(e+1, EPOCHS))
lstm.reset_cell()
for i in range(0, DATA_LEN, BATCH_SIZE):
if i % 50000 == 0:
print(i/float(DATA_LEN))
optimizer.zero_grad()
input_vector = torch.tensor([[
vocab.get(char, len(vocab))
for char in training_data[i+b:i+b+SEQ_LEN]
] for b in range(BATCH_SIZE)])
if USE_CUDA and torch.cuda.is_available():
input_vector = input_vector.cuda()
output_vector = lstm(input_vector)
target_vector = torch.zeros(output_vector.shape)
if USE_CUDA and torch.cuda.is_available():
target_vector = target_vector.cuda()
for b in range(BATCH_SIZE):
target_vector[b][vocab.get(training_data[i+b+SEQ_LEN])] = 1
error = loss(output_vector, target_vector)
error.backward()
optimizer.step()
loss_dict[(e, int(i/BATCH_SIZE))] = error.detach().item()