Prediction function for LSTM text generator gives 0 matches


I am working on a next word prediction model using a regularised LSTM architecture. I am struggling to understand why my model performs bad on inference. I have trained a LSTM model on the Penn Tree Bank dataset reaching about 94 in test perplexity, while this is not the best result I think it should give some correct predictions.

When I run my prediction functions on the test file, I get 0 matches however I do see predictions. I am not sure if exact matches are not supposed to happen or if my function itself has some error. Here is my code:

#Model Loading 

PATH = "./ptb_model.pth"
model = Model(vocab_size, hidden_size, layer_num, dropout, winit, lstm_type)
model = torch.load(PATH)

print("Vocab Size: {}".format(vocab_size))
unique_index_word = dict((i,c) for i,c in enumerate(sorted(vocab)))

def predict(string, model):
    batch_size = 1
    string = clean_text(string)
    string = tokenize_text(string)
    string = to_lower_case_list(string)
    hidden = model.state_init(batch_size)
      for w in string:
          ix = torch.tensor([[unique_word_index[w]]])
          output, hidden= model(ix, hidden)
      _ , top_ix = torch.topk(output[0], k=10)
      choices = top_ix.tolist()
      choice = np.random.choice(choices[0])

      out_word = unique_index_word[choice]
      out_word  = "Error"
    return out_word

with open("ptb.test.txt", 'r') as f:
  lines = f.readlines()
  matches = 0
  total = 0
  for line in lines:
    line = line.strip(" ")
    text  = line.split(" ")
    if len(text) > 7:
      exact = text[7]
      text = text[:7]
      text = " ".join(text)
      out = predict(text, model)
      print("Sequence: {}".format(text))
      print("Prediction: {} Exact: {}".format(out, exact))
      total += 1
      if out == exact:
        matches += 1

I am also not sure how to include an accuracy measure in my training process. Here is my code for the perplexity:

def perplexity(data, model):
    with torch.no_grad():
        losses = []
        states = model.state_init(batch_size)
        accuracy = 0
        for x, y in data:
            scores, states = model(x, states)
            loss = nll_loss(scores, y)
            ps = torch.exp(scores)
            #equality = ([0] == ps.max(dim = 1))
            #accuracy += equality.type(torch.FloatTensor).mean()
    return np.exp(np.mean(losses)), np.mean(losses), accuracy

If I’m not wrong, since I have 0 matches on inference, wouldn’t my model accuracy be 0? This just seems a bit strange to me. If any one knows what’s wrong, I would be very grateful!