LSTMs capabilities for sequence predictions

Hello,
I am new to RNNs and am trying to understand the capability of LSTM models so I have created a contrived, short and simple example that attempts to predict the “next number” given a specific input sequence of numbers. I trained 13 unique sequences of 7 numbers 40 times each along with the “next number” for the sequence and found that given the exact same sequences, the prediction of the next number following the sequence was correct 100% of the time but given the same sequences with 1 of the 7 numbers different and the other 6 the same, there were predictions that were not as expected (i.e. did not match the training data).

Am I doing something incorrect? I expected the predictions to be 100% correct since the sequences were repeated 40 times in the training data and the sequences to predict for were very similar to the training data sequences since only 1 of the 7 numbers were different but are my expectations incorrect?

Thank you.

If it helps, below is the output of the program (which shows 1 “unexpected” prediction):

training 40 times, next number of 707 for sequence [0, 101, 202, 303, 404, 505, 606]
training 40 times, next number of 808 for sequence [101, 202, 303, 404, 505, 606, 707]
training 40 times, next number of 909 for sequence [202, 303, 404, 505, 606, 707, 808]
training 40 times, next number of 1010 for sequence [303, 404, 505, 606, 707, 808, 909]
training 40 times, next number of 1111 for sequence [404, 505, 606, 707, 808, 909, 1010]
training 40 times, next number of 1212 for sequence [505, 606, 707, 808, 909, 1010, 1111]
training 40 times, next number of 1313 for sequence [606, 707, 808, 909, 1010, 1111, 1212]
training 40 times, next number of 1414 for sequence [707, 808, 909, 1010, 1111, 1212, 1313]
training 40 times, next number of 1515 for sequence [808, 909, 1010, 1111, 1212, 1313, 1414]
training 40 times, next number of 1616 for sequence [909, 1010, 1111, 1212, 1313, 1414, 1515]
training 40 times, next number of 1717 for sequence [1010, 1111, 1212, 1313, 1414, 1515, 1616]
training 40 times, next number of 1818 for sequence [1111, 1212, 1313, 1414, 1515, 1616, 1717]
training 40 times, next number of 1919 for sequence [1212, 1313, 1414, 1515, 1616, 1717, 1818]
performed 152 epochs of training
For sequence, [0, 101, 202, 303, 404, 505, 606], predicted next of 707 matches expected
For sequence, [101, 202, 303, 404, 505, 606, 707], predicted next of 808 matches expected
For sequence, [202, 303, 404, 505, 606, 707, 808], predicted next of 909 matches expected
For sequence, [303, 404, 505, 606, 707, 808, 909], predicted next of 1010 matches expected
For sequence, [404, 505, 606, 707, 808, 909, 1010], predicted next of 1111 matches expected
For sequence, [505, 606, 707, 808, 909, 1010, 1111], predicted next of 1212 matches expected
For sequence, [606, 707, 808, 909, 1010, 1111, 1212], predicted next of 1313 matches expected
For sequence, [707, 808, 909, 1010, 1111, 1212, 1313], predicted next of 1414 matches expected
For sequence, [808, 909, 1010, 1111, 1212, 1313, 1414], predicted next of 1515 matches expected
For sequence, [909, 1010, 1111, 1212, 1313, 1414, 1515], predicted next of 1616 matches expected
For sequence, [1010, 1111, 1212, 1313, 1414, 1515, 1616], predicted next of 1717 matches expected
For sequence, [1111, 1212, 1313, 1414, 1515, 1616, 1717], predicted next of 1818 matches expected
For sequence, [1212, 1313, 1414, 1515, 1616, 1717, 1818], predicted next of 1919 matches expected
For sequence, [0, 101, 202, 303, 404, 1515, 606], predicted next of 707 matches expected
For sequence, [101, 202, 303, 404, 1515, 606, 707], predicted next of 808 matches expected
For sequence, [202, 303, 404, 1515, 606, 707, 808], predicted next of 909 matches expected
For sequence, [303, 404, 1515, 606, 707, 808, 909], predicted next of 1010 matches expected
For sequence, [404, 1515, 606, 707, 808, 909, 1010], predicted next of 1111 matches expected
For sequence, [1515, 606, 707, 808, 909, 1010, 1111], predicted next of 1919 does not match expected next of 1212
For sequence, [606, 707, 808, 909, 1010, 1111, 1212], predicted next of 1313 matches expected
For sequence, [707, 808, 909, 1010, 1111, 1212, 1313], predicted next of 1414 matches expected
For sequence, [808, 909, 1010, 1111, 1212, 1313, 1414], predicted next of 1515 matches expected
For sequence, [909, 1010, 1111, 1212, 1313, 1414, 1515], predicted next of 1616 matches expected
For sequence, [1010, 1111, 1212, 1313, 1414, 1515, 1616], predicted next of 1717 matches expected
For sequence, [1111, 1212, 1313, 1414, 1515, 1616, 1717], predicted next of 1818 matches expected
For sequence, [1212, 1313, 1414, 1515, 1616, 1717, 1818], predicted next of 1919 matches expected

Below is the full source of the program:

import sys
import torch

MAX_ALLOWED_CONSECUTIVES_WITHOUT_IMPROVEMENT = 10
SEQUENCE_LENGTH = 7
NUMBERS_IN_SEQUENCE = 20
BATCH_SIZE = NUMBERS_IN_SEQUENCE - SEQUENCE_LENGTH
NUM_OF_LAYERS = 2
NUM_OF_HIDDEN_LAYERS = 64
LOSS_BUFFER = 0.005
NUM_OF_TIMES_TO_TRAIN_WITH_SAME_DATA = 40
ANOMALY_MULTIPLIER = 101 # How much to multiply the number, that is to be anomalous, by.

class LSTMModel(torch.nn.Module):
def init(self, num_of_different_possible_numbers):
super(LSTMModel, self).init()
self.lstm = torch.nn.LSTM(input_size=1, hidden_size=NUM_OF_HIDDEN_LAYERS, num_layers=NUM_OF_LAYERS, batch_first=True)
self.linear = torch.nn.Linear(NUM_OF_HIDDEN_LAYERS, num_of_different_possible_numbers)

def forward(self, x):
    h0 = torch.zeros(NUM_OF_LAYERS, x.size(0), NUM_OF_HIDDEN_LAYERS)
    c0 = torch.zeros(NUM_OF_LAYERS, x.size(0), NUM_OF_HIDDEN_LAYERS)
    out, _ = self.lstm(x, (h0, c0))
    out = self.linear(out[:, 0, :])
    return out

def train():
# train on inputs and outputs that we will create; perform loss calculations until we have a diminishing return
inputs, outputs = createData(True, NUM_OF_TIMES_TO_TRAIN_WITH_SAME_DATA, False)
dataset = torch.utils.data.TensorDataset(torch.tensor(inputs, dtype=torch.float), torch.tensor(outputs))
dataloader = torch.utils.data.DataLoader(dataset, batch_size=BATCH_SIZE)
criterion = torch.nn.CrossEntropyLoss()
model = LSTMModel((NUMBERS_IN_SEQUENCE - 1) * ANOMALY_MULTIPLIER + 1) # - 1 and + 1 both since index is 0-based
optimizer = torch.optim.Adam(model.parameters())
total_batches = len(dataloader)
num_of_epochs_so_far = 0
min_loss_so_far = 100 # more than enough for this experiment
consecutives_without_enough_improvement = 0
while True:
training_loss = 0
for step, (sequences, numbersAfterSequence) in enumerate(dataloader):
sequences = sequences.clone().detach().view(-1, SEQUENCE_LENGTH, 1) # 1 specified for input_size since only 1 feature expected in the input data. -1 means to keep the original dimension, I think.
output = model(sequences)
loss = criterion(output, numbersAfterSequence)
optimizer.zero_grad()
loss.backward()
training_loss += loss.item()
optimizer.step()
num_of_epochs_so_far += 1
if (training_loss / total_batches) < (min_loss_so_far - LOSS_BUFFER):
consecutives_without_enough_improvement = 0 # reset
if (training_loss / total_batches) < min_loss_so_far:
# The current loss was an improvement
min_loss_so_far = training_loss / total_batches
else:
# The current loss was not enough of an improvement
consecutives_without_enough_improvement += 1
if consecutives_without_enough_improvement > MAX_ALLOWED_CONSECUTIVES_WITHOUT_IMPROVEMENT:
break
print(‘performed ’ + str(num_of_epochs_so_far) + ’ epochs of training’)

return model

def predict(model_trained):
inputs, outputs_expected = createData(False, 1, False)
for i in range(0, len(inputs)):
outputs_predicted = model_trained(torch.tensor(inputs[i], dtype=torch.float).view(-1, SEQUENCE_LENGTH, 1))
analyze(inputs[i], outputs_expected[i], outputs_predicted)

inputs, outputs_expected = createData(False, 1, True)
for i in range(0, len(inputs)):
    outputs_predicted = model_trained(torch.tensor(inputs[i], dtype=torch.float).view(-1, SEQUENCE_LENGTH, 1))
    analyze(inputs[i], outputs_expected[i], outputs_predicted)

inputs is list of list of numbers and outputs is list of numbers

introduce_anomaly is a boolean of whether or not to change one of the numbers in the data to be anomalous

def createData(data_will_be_used_for_training = True, num_of_iterations=1, introduce_anomaly=True):
ELEMENT_INDEX_TO_BE_ANOMALY = 5
REPLACEMENT_ANOMALY_INDEX = 15 # Note, this number and the one above are related to NUMBERS_IN_SEQUENCE so change carefully if need be
inputs = []
outputs = []
for training_pass in range (0, num_of_iterations) :
for i in range(0, NUMBERS_IN_SEQUENCE - SEQUENCE_LENGTH):
input_list = []
for j in range(0, SEQUENCE_LENGTH):
if introduce_anomaly and (i + j == ELEMENT_INDEX_TO_BE_ANOMALY):
input_list.append(transform(REPLACEMENT_ANOMALY_INDEX))
else:
input_list.append(transform(i + j))
inputs.append(input_list)
outputs.append(transform(i + SEQUENCE_LENGTH))
if data_will_be_used_for_training and training_pass == 0:
print('training ’ + str(num_of_iterations) + ’ times, next number of ’ + str(transform(i + SEQUENCE_LENGTH)) + ’ for sequence ’ + str(input_list))
return (inputs, outputs)

This functon exists so as to prevent simple sequences which are just consecutive numbers (but even if the numbers

are consecutive, there are still predictions that do not match the training data.

def transform(input_number):
return (input_number * ANOMALY_MULTIPLIER) # prevent sequence of adding just 1 (but did not seem to matter anyway)

def analyze(input_sequence, output_expected, outputs_predicted):
sortedOutput = torch.argsort(outputs_predicted, 1)
predicted_next = sortedOutput[0][-1:].item()
if predicted_next == output_expected:
print('For sequence, ’ + str(input_sequence) + ‘, predicted next of ’ + str(predicted_next) + ’ matches expected’)
else:
print('For sequence, ’ + str(input_sequence) + ', predicted next of ’ + str(predicted_next) + ’ does not match expected next of ’ + str(output_expected))

Main program

model = train()
predict(model)

sys.exit(0)

Here is a better “pasting” of the full source code. Sorry for the previous post.

import sys
import torch

MAX_ALLOWED_CONSECUTIVES_WITHOUT_IMPROVEMENT = 10
SEQUENCE_LENGTH = 7
NUMBERS_IN_SEQUENCE = 20
BATCH_SIZE = NUMBERS_IN_SEQUENCE - SEQUENCE_LENGTH
NUM_OF_LAYERS = 2
NUM_OF_HIDDEN_LAYERS = 64
LOSS_BUFFER = 0.005
NUM_OF_TIMES_TO_TRAIN_WITH_SAME_DATA = 40
ANOMALY_MULTIPLIER = 101 # How much to multiply the number, that is to be anomalous, by.

class LSTMModel(torch.nn.Module):
    def __init__(self, num_of_different_possible_numbers):
        super(LSTMModel, self).__init__()
        self.lstm =  torch.nn.LSTM(input_size=1, hidden_size=NUM_OF_HIDDEN_LAYERS, num_layers=NUM_OF_LAYERS, batch_first=True)
        self.linear = torch.nn.Linear(NUM_OF_HIDDEN_LAYERS, num_of_different_possible_numbers)

    def forward(self, x):
        h0 = torch.zeros(NUM_OF_LAYERS, x.size(0), NUM_OF_HIDDEN_LAYERS)
        c0 = torch.zeros(NUM_OF_LAYERS, x.size(0), NUM_OF_HIDDEN_LAYERS)
        out, _ = self.lstm(x, (h0, c0))
        out = self.linear(out[:, 0, :])
        return out

def train():
    # train on inputs and outputs that we will create; perform loss calculations until we have a diminishing return
    inputs, outputs = createData(True, NUM_OF_TIMES_TO_TRAIN_WITH_SAME_DATA, False)
    dataset = torch.utils.data.TensorDataset(torch.tensor(inputs, dtype=torch.float), torch.tensor(outputs))
    dataloader = torch.utils.data.DataLoader(dataset, batch_size=BATCH_SIZE)
    criterion = torch.nn.CrossEntropyLoss()
    model = LSTMModel((NUMBERS_IN_SEQUENCE - 1) * ANOMALY_MULTIPLIER + 1) # - 1 and + 1 both since index is 0-based
    optimizer = torch.optim.Adam(model.parameters())
    total_batches = len(dataloader)
    num_of_epochs_so_far = 0
    min_loss_so_far = 100 # more than enough for this experiment
    consecutives_without_enough_improvement = 0
    while True:
        training_loss = 0
        for step, (sequences, numbersAfterSequence) in enumerate(dataloader):
            sequences = sequences.clone().detach().view(-1, SEQUENCE_LENGTH, 1) # 1 specified for input_size since only 1 feature expected in the input data. -1 means to keep the original dimension, I think.
            output = model(sequences)
            loss = criterion(output, numbersAfterSequence)
            optimizer.zero_grad()
            loss.backward()
            training_loss += loss.item()
            optimizer.step()
        num_of_epochs_so_far += 1
        if (training_loss / total_batches) < (min_loss_so_far - LOSS_BUFFER):
            consecutives_without_enough_improvement = 0 # reset 
            if (training_loss / total_batches) < min_loss_so_far:
                # The current loss was an improvement
                min_loss_so_far = training_loss / total_batches
        else:
            # The current loss was not enough of an improvement
            consecutives_without_enough_improvement += 1
            if consecutives_without_enough_improvement > MAX_ALLOWED_CONSECUTIVES_WITHOUT_IMPROVEMENT:
                break
    print('performed ' + str(num_of_epochs_so_far) + ' epochs of training')

    return model

def predict(model_trained):
    inputs, outputs_expected = createData(False, 1, False)
    for i in range(0, len(inputs)):
        outputs_predicted = model_trained(torch.tensor(inputs[i], dtype=torch.float).view(-1, SEQUENCE_LENGTH, 1))
        analyze(inputs[i], outputs_expected[i], outputs_predicted)

    inputs, outputs_expected = createData(False, 1, True)
    for i in range(0, len(inputs)):
        outputs_predicted = model_trained(torch.tensor(inputs[i], dtype=torch.float).view(-1, SEQUENCE_LENGTH, 1))
        analyze(inputs[i], outputs_expected[i], outputs_predicted)

# inputs is list of list of numbers and outputs is list of numbers
# introduce_anomaly is a boolean of whether or not to change one of the numbers in the data to be anomalous
def createData(data_will_be_used_for_training = True, num_of_iterations=1, introduce_anomaly=True):
    ELEMENT_INDEX_TO_BE_ANOMALY = 5
    REPLACEMENT_ANOMALY_INDEX = 15 # Note, this number and the one above are related to NUMBERS_IN_SEQUENCE so change carefully if need be
    inputs = []
    outputs = []
    for training_pass in range (0, num_of_iterations) :
        for i in range(0, NUMBERS_IN_SEQUENCE - SEQUENCE_LENGTH):
            input_list = []
            for j in range(0, SEQUENCE_LENGTH):
                if introduce_anomaly and (i + j == ELEMENT_INDEX_TO_BE_ANOMALY):
                    input_list.append(transform(REPLACEMENT_ANOMALY_INDEX))
                else:
                    input_list.append(transform(i + j))
            inputs.append(input_list)
            outputs.append(transform(i + SEQUENCE_LENGTH))
            if data_will_be_used_for_training and training_pass == 0:
                print('training ' + str(num_of_iterations) + ' times, next number of ' + str(transform(i + SEQUENCE_LENGTH)) + ' for sequence ' + str(input_list))
    return (inputs, outputs)

# This functon exists so as to prevent simple sequences which are just consecutive numbers (but even if the numbers
# are consecutive, there are still predictions that do not match the training data.
def transform(input_number):
    return (input_number * ANOMALY_MULTIPLIER) # prevent sequence of adding just 1 (but did not seem to matter anyway)

def analyze(input_sequence, output_expected, outputs_predicted):
    sortedOutput = torch.argsort(outputs_predicted, 1)
    predicted_next = sortedOutput[0][-1:].item()
    if predicted_next == output_expected:
        print('For sequence, ' + str(input_sequence) + ', predicted next of ' + str(predicted_next) + ' matches expected')
    else:
        print('For sequence, ' + str(input_sequence) + ', predicted next of ' + str(predicted_next) + ' does not match expected next of ' + str(output_expected))

# Main program
model = train()
predict(model)

sys.exit(0)