I want to train a LSTM to predict a word given 2 words. For example, if both ‘dog’ and ‘drink’ are fed to LSTM, then it’s expected to see LSTM predicts ‘water’ as the next word. To do so, I read the example ‘N-Gram Language Modeling’ and ‘An LSTM for Part-of-Speech Tagging’ in official tutorial, and replaced the model in example ‘N-Gram Language Modeling’ by an adjusted version of the class LSTMTagger in example ‘An LSTM for Part-of-Speech Tagging’.
However, I found a very strange phenomenon associated with saving and loading models. For instance, assuming a model achieves accuracy of 60% after training and saved to local disk, and then if I loaded the trained model and repeated training on the same training data, the first epoch yields accuracy near 0. It seems the loaded model is trained from scratch.
This is strange because I adopted the same save-and-load strategy to example ‘N-Gram Language Modeling’ and ‘An LSTM for Part-of-Speech Tagging’, the loaded models for both examples yield the correct accuracies. For instance, assuming the accuracy of model in ‘N-Gram Language Modeling’ achieves 80%, then if the model is loaded and training again, the accuracy yielded in the first epoch is about 80%.
I hope someone can explain what is wrong in my script, I cannot spot the bugs because of my limited experience of Pytorch. The working script is as follows, it runs in 64-bit Windows 10 and Pytorch 0.4.
import re, string, time, os, subprocess
from pathlib import Path
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
torch.manual_seed(1)
test_sentence = """A long time ago, there were four families who lived in a small village in Somalia. The first family would argue all of the time, the second family were very greedy, the third family were always away from the village exploring because they were never happy with what they had or where they lived. But the fourth family were calm and patient, and they enjoyed living in their small community.
One night, the daughter of the third family was out exploring when she discovered a well hidden among some trees in the wilderness. The daughter ran back to her family and told them about the well and so they started to use the well to get their water.
It was not long before the other families heard news of the well, and very soon all four families were using the well to get their water until it was in danger of running dry.
This went on for some time, and it was obvious that the water in the well was getting lower and lower, yet none of the families wanted to stop using the well as it was close to the village and meant that they did not have to walk so far to get the water which they used to drink and cook and clean with.
One day, the wise chief, who had always known about the secret well, spoke to each family in turn. The chief said to them, ‘Tonight you must stay in your homes. You must not use the well for one whole night, that way the water will have time to rise once more.’
Each of the families agreed to stay away from the well, especially as the wise chief warned that there would be a severe punishment for any family who disobeyed this simple rule.
But when night fell, the son of the first family could not resist visiting the well as he wanted to make sure he had plenty of water for the following day so that his family would not argue over who would walk the long distance to the usual well used by the rest of the villagers. He crept out to the well carrying two large buckets and filled them both to the top before returning to his home and hiding the buckets where they would not be seen.
Not long after, the son of the second family also crept out to the well and filled two large buckets all the way to the top as he was very greedy and wanted the water for his family alone.
Then the daughter of the third family also crept out to the well as she could not resist exploring at night and reasoned that it was she who had discovered the well in the first place so it was her family who deserved the extra water despite the warning from the wise chief.
The next day, the chief visited the well and was distressed to find that it was completely dry. He waited until he knew that all of the families were away from their homes, then he visited each home in turn.
In the first home he discovered the two buckets, one of which was already empty, but the other still contained the water which was stolen from the well. When he visited the second and third homes he also discovered the buckets of water hidden where nobody would see them. But when he visited the fourth home he discovered that the buckets were dry and realised that the patient family had remained in their beds all night. They had listened to his warning and had stayed away from the well so that the water might rise once more.
The wise chief called all four families to the meeting place in the village where he confronted them about the well. ‘You three families all stole water from the well even though I told you not to,’ said the chief in a stern voice. ‘I know this because I visited your homes this morning and discovered the buckets of water. Because you defied my instructions you will be forced to remain in your homes for thirty days and nights without food or water as punishment. I hope that you will spend this time thinking about the wrong you have done.’
To the fourth family he said, ‘You listened to my simple instructions and stayed in your home last night and did not visit the well. Take this letter and open it when you return to your home.’
The fourth family took the letter and returned home. When they opened the letter there was a map inside. The family followed the directions on the map and after travelling for many miles they discovered a well surrounded by an abundance of fruit trees and vegetable plants. There was enough food and water to last the family a whole lifetime!
The families who were forced to stay in their homes without food or water learned a valuable lesson that day. They learned that it was always best to listen to the advice of one’s elders and not to take things when you were told not to. They also realised that the fourth family were rewarded for their patience and their willingness to follow the simple rules which benefit a community.
""".lower().split()
# we should tokenize the input, but we will ignore that for now
# build a list of tuples. Each tuple is ([ word_i-2, word_i-1 ], target word)
trigrams = [([test_sentence[i], test_sentence[i + 1]], test_sentence[i + 2])
for i in range(len(test_sentence) - 2)]
SAVE_PATH = Path(os.path.join('.', 'NGramModel.tar')).resolve()
vocab = set(test_sentence)
word_to_ix = {word: i for i, word in enumerate(vocab)}
IS_FORCE_TRAIN = True
EMBEDDING_DIM = 10
HIDDEN_DIM = 64
VOCAB_SIZE = len(vocab)
TAGSET_SIZE = VOCAB_SIZE
EPOCH_NUM = 10
def prepare_sequence(input_sentence, input_word_to_ix):
"""
Given a list containing strings and corresponding index map, return the string indices wrapped in a tensor.
This function serves for providing the correct data type required by nn.Embedding()
"""
idxs = [word_to_ix[w] for w in input_sentence]
return torch.tensor(idxs, dtype=torch.long)
# Create the model:
class LSTMTagger(nn.Module):
def __init__(self, embedding_dim, hidden_dim, vocab_size, tagset_size):
super(LSTMTagger, self).__init__()
self.hidden_dim = hidden_dim
self.word_embeddings = nn.Embedding(vocab_size, embedding_dim)
# The LSTM takes word embeddings as inputs, and outputs hidden states
# with dimensionality hidden_dim. Because our inputs are sentences
# in which the words are embedded, hence the input size is the product
# of word number and embedding dimension
self.lstm = nn.LSTM(embedding_dim * 2, hidden_dim)
# The linear layer that maps from hidden state space to tag space
self.hidden2tag = nn.Linear(hidden_dim, tagset_size)
self.hidden = self.init_hidden()
def init_hidden(self):
# The axes semantics are (num_layers, minibatch_size, hidden_dim)
# notice, if 'to(device)' is omitted, then LSTM will crash when GPU
# is the default device
return (torch.zeros(1, 1, self.hidden_dim),
torch.zeros(1, 1, self.hidden_dim))
def forward(self, sentence):
embeds = self.word_embeddings(sentence)
lstm_out, self.hidden = self.lstm(
embeds.view(1, 1, -1), self.hidden)
tag_space = self.hidden2tag(lstm_out.view(1, -1))
tag_scores = F.log_softmax(tag_space, dim=1)
return tag_scores
model = LSTMTagger(EMBEDDING_DIM, HIDDEN_DIM, VOCAB_SIZE, TAGSET_SIZE)
try:
model.load_state_dict(torch.load(SAVE_PATH))
print('model loaded successfully')
except FileNotFoundError:
# doesn't exist
do_nothing = 1
loss_function = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1)
model.train()
for epoch in range(EPOCH_NUM):
accuracy = 0
for context, target in trigrams:
# Step 1. Prepare the inputs to be passed to the model (i.e, turn the words
# into integer indices and wrap them in variables)
context_idxs = torch.tensor([word_to_ix[w] for w in context], dtype=torch.long)
# Step 2. Recall that torch *accumulates* gradients. Before passing in a
# new instance, you need to zero out the gradients from the old
# instance
model.zero_grad()
model.hidden = model.init_hidden()
# Step 3. Run the forward pass, getting log probabilities over next
# words
log_probs = model(context_idxs)
_, predicted_class = torch.max(log_probs, 1)
accuracy = accuracy + 1 if predicted_class.item() == word_to_ix[target] else accuracy + 0
# Step 4. Compute your loss function. (Again, Torch wants the target
# word wrapped in a variable)
loss = loss_function(log_probs, torch.tensor([word_to_ix[target]], dtype=torch.long))
# Step 5. Do the backward pass and update the gradient
loss.backward()
optimizer.step()
print ('Epoch [%d/%d], Loss: %.4f, ACC: %.4f' %(epoch+1, EPOCH_NUM, loss.item(), accuracy / len(test_sentence)))
torch.save(model.state_dict(), SAVE_PATH)