Hello,

I am new to RNNs and am trying to understand the capability of LSTM models so I have created a contrived, short and simple example that attempts to predict the “next number” given a specific input sequence of numbers. I trained 13 unique sequences of 7 numbers 40 times each along with the “next number” for the sequence and found that given the exact same sequences, the prediction of the next number following the sequence was correct 100% of the time but given the same sequences with 1 of the 7 numbers different and the other 6 the same, there were predictions that were not as expected (i.e. did not match the training data).

Am I doing something incorrect? I expected the predictions to be 100% correct since the sequences were repeated 40 times in the training data and the sequences to predict for were very similar to the training data sequences since only 1 of the 7 numbers were different but are my expectations incorrect?

Thank you.

If it helps, below is the output of the program (which shows 1 “unexpected” prediction):

training 40 times, next number of 707 for sequence [0, 101, 202, 303, 404, 505, 606]

training 40 times, next number of 808 for sequence [101, 202, 303, 404, 505, 606, 707]

training 40 times, next number of 909 for sequence [202, 303, 404, 505, 606, 707, 808]

training 40 times, next number of 1010 for sequence [303, 404, 505, 606, 707, 808, 909]

training 40 times, next number of 1111 for sequence [404, 505, 606, 707, 808, 909, 1010]

training 40 times, next number of 1212 for sequence [505, 606, 707, 808, 909, 1010, 1111]

training 40 times, next number of 1313 for sequence [606, 707, 808, 909, 1010, 1111, 1212]

training 40 times, next number of 1414 for sequence [707, 808, 909, 1010, 1111, 1212, 1313]

training 40 times, next number of 1515 for sequence [808, 909, 1010, 1111, 1212, 1313, 1414]

training 40 times, next number of 1616 for sequence [909, 1010, 1111, 1212, 1313, 1414, 1515]

training 40 times, next number of 1717 for sequence [1010, 1111, 1212, 1313, 1414, 1515, 1616]

training 40 times, next number of 1818 for sequence [1111, 1212, 1313, 1414, 1515, 1616, 1717]

training 40 times, next number of 1919 for sequence [1212, 1313, 1414, 1515, 1616, 1717, 1818]

performed 152 epochs of training

For sequence, [0, 101, 202, 303, 404, 505, 606], predicted next of 707 matches expected

For sequence, [101, 202, 303, 404, 505, 606, 707], predicted next of 808 matches expected

For sequence, [202, 303, 404, 505, 606, 707, 808], predicted next of 909 matches expected

For sequence, [303, 404, 505, 606, 707, 808, 909], predicted next of 1010 matches expected

For sequence, [404, 505, 606, 707, 808, 909, 1010], predicted next of 1111 matches expected

For sequence, [505, 606, 707, 808, 909, 1010, 1111], predicted next of 1212 matches expected

For sequence, [606, 707, 808, 909, 1010, 1111, 1212], predicted next of 1313 matches expected

For sequence, [707, 808, 909, 1010, 1111, 1212, 1313], predicted next of 1414 matches expected

For sequence, [808, 909, 1010, 1111, 1212, 1313, 1414], predicted next of 1515 matches expected

For sequence, [909, 1010, 1111, 1212, 1313, 1414, 1515], predicted next of 1616 matches expected

For sequence, [1010, 1111, 1212, 1313, 1414, 1515, 1616], predicted next of 1717 matches expected

For sequence, [1111, 1212, 1313, 1414, 1515, 1616, 1717], predicted next of 1818 matches expected

For sequence, [1212, 1313, 1414, 1515, 1616, 1717, 1818], predicted next of 1919 matches expected

For sequence, [0, 101, 202, 303, 404, 1515, 606], predicted next of 707 matches expected

For sequence, [101, 202, 303, 404, 1515, 606, 707], predicted next of 808 matches expected

For sequence, [202, 303, 404, 1515, 606, 707, 808], predicted next of 909 matches expected

For sequence, [303, 404, 1515, 606, 707, 808, 909], predicted next of 1010 matches expected

For sequence, [404, 1515, 606, 707, 808, 909, 1010], predicted next of 1111 matches expected

For sequence, [1515, 606, 707, 808, 909, 1010, 1111], predicted next of 1919 does not match expected next of 1212

For sequence, [606, 707, 808, 909, 1010, 1111, 1212], predicted next of 1313 matches expected

For sequence, [707, 808, 909, 1010, 1111, 1212, 1313], predicted next of 1414 matches expected

For sequence, [808, 909, 1010, 1111, 1212, 1313, 1414], predicted next of 1515 matches expected

For sequence, [909, 1010, 1111, 1212, 1313, 1414, 1515], predicted next of 1616 matches expected

For sequence, [1010, 1111, 1212, 1313, 1414, 1515, 1616], predicted next of 1717 matches expected

For sequence, [1111, 1212, 1313, 1414, 1515, 1616, 1717], predicted next of 1818 matches expected

For sequence, [1212, 1313, 1414, 1515, 1616, 1717, 1818], predicted next of 1919 matches expected

Below is the full source of the program:

import sys

import torch

MAX_ALLOWED_CONSECUTIVES_WITHOUT_IMPROVEMENT = 10

SEQUENCE_LENGTH = 7

NUMBERS_IN_SEQUENCE = 20

BATCH_SIZE = NUMBERS_IN_SEQUENCE - SEQUENCE_LENGTH

NUM_OF_LAYERS = 2

NUM_OF_HIDDEN_LAYERS = 64

LOSS_BUFFER = 0.005

NUM_OF_TIMES_TO_TRAIN_WITH_SAME_DATA = 40

ANOMALY_MULTIPLIER = 101 # How much to multiply the number, that is to be anomalous, by.

class LSTMModel(torch.nn.Module):

def **init**(self, num_of_different_possible_numbers):

super(LSTMModel, self).**init**()

self.lstm = torch.nn.LSTM(input_size=1, hidden_size=NUM_OF_HIDDEN_LAYERS, num_layers=NUM_OF_LAYERS, batch_first=True)

self.linear = torch.nn.Linear(NUM_OF_HIDDEN_LAYERS, num_of_different_possible_numbers)

```
def forward(self, x):
h0 = torch.zeros(NUM_OF_LAYERS, x.size(0), NUM_OF_HIDDEN_LAYERS)
c0 = torch.zeros(NUM_OF_LAYERS, x.size(0), NUM_OF_HIDDEN_LAYERS)
out, _ = self.lstm(x, (h0, c0))
out = self.linear(out[:, 0, :])
return out
```

def train():

# train on inputs and outputs that we will create; perform loss calculations until we have a diminishing return

inputs, outputs = createData(True, NUM_OF_TIMES_TO_TRAIN_WITH_SAME_DATA, False)

dataset = torch.utils.data.TensorDataset(torch.tensor(inputs, dtype=torch.float), torch.tensor(outputs))

dataloader = torch.utils.data.DataLoader(dataset, batch_size=BATCH_SIZE)

criterion = torch.nn.CrossEntropyLoss()

model = LSTMModel((NUMBERS_IN_SEQUENCE - 1) * ANOMALY_MULTIPLIER + 1) # - 1 and + 1 both since index is 0-based

optimizer = torch.optim.Adam(model.parameters())

total_batches = len(dataloader)

num_of_epochs_so_far = 0

min_loss_so_far = 100 # more than enough for this experiment

consecutives_without_enough_improvement = 0

while True:

training_loss = 0

for step, (sequences, numbersAfterSequence) in enumerate(dataloader):

sequences = sequences.clone().detach().view(-1, SEQUENCE_LENGTH, 1) # 1 specified for input_size since only 1 feature expected in the input data. -1 means to keep the original dimension, I think.

output = model(sequences)

loss = criterion(output, numbersAfterSequence)

optimizer.zero_grad()

loss.backward()

training_loss += loss.item()

optimizer.step()

num_of_epochs_so_far += 1

if (training_loss / total_batches) < (min_loss_so_far - LOSS_BUFFER):

consecutives_without_enough_improvement = 0 # reset

if (training_loss / total_batches) < min_loss_so_far:

# The current loss was an improvement

min_loss_so_far = training_loss / total_batches

else:

# The current loss was not enough of an improvement

consecutives_without_enough_improvement += 1

if consecutives_without_enough_improvement > MAX_ALLOWED_CONSECUTIVES_WITHOUT_IMPROVEMENT:

break

print(‘performed ’ + str(num_of_epochs_so_far) + ’ epochs of training’)

```
return model
```

def predict(model_trained):

inputs, outputs_expected = createData(False, 1, False)

for i in range(0, len(inputs)):

outputs_predicted = model_trained(torch.tensor(inputs[i], dtype=torch.float).view(-1, SEQUENCE_LENGTH, 1))

analyze(inputs[i], outputs_expected[i], outputs_predicted)

```
inputs, outputs_expected = createData(False, 1, True)
for i in range(0, len(inputs)):
outputs_predicted = model_trained(torch.tensor(inputs[i], dtype=torch.float).view(-1, SEQUENCE_LENGTH, 1))
analyze(inputs[i], outputs_expected[i], outputs_predicted)
```

# inputs is list of list of numbers and outputs is list of numbers

# introduce_anomaly is a boolean of whether or not to change one of the numbers in the data to be anomalous

def createData(data_will_be_used_for_training = True, num_of_iterations=1, introduce_anomaly=True):

ELEMENT_INDEX_TO_BE_ANOMALY = 5

REPLACEMENT_ANOMALY_INDEX = 15 # Note, this number and the one above are related to NUMBERS_IN_SEQUENCE so change carefully if need be

inputs = []

outputs = []

for training_pass in range (0, num_of_iterations) :

for i in range(0, NUMBERS_IN_SEQUENCE - SEQUENCE_LENGTH):

input_list = []

for j in range(0, SEQUENCE_LENGTH):

if introduce_anomaly and (i + j == ELEMENT_INDEX_TO_BE_ANOMALY):

input_list.append(transform(REPLACEMENT_ANOMALY_INDEX))

else:

input_list.append(transform(i + j))

inputs.append(input_list)

outputs.append(transform(i + SEQUENCE_LENGTH))

if data_will_be_used_for_training and training_pass == 0:

print('training ’ + str(num_of_iterations) + ’ times, next number of ’ + str(transform(i + SEQUENCE_LENGTH)) + ’ for sequence ’ + str(input_list))

return (inputs, outputs)

# This functon exists so as to prevent simple sequences which are just consecutive numbers (but even if the numbers

# are consecutive, there are still predictions that do not match the training data.

def transform(input_number):

return (input_number * ANOMALY_MULTIPLIER) # prevent sequence of adding just 1 (but did not seem to matter anyway)

def analyze(input_sequence, output_expected, outputs_predicted):

sortedOutput = torch.argsort(outputs_predicted, 1)

predicted_next = sortedOutput[0][-1:].item()

if predicted_next == output_expected:

print('For sequence, ’ + str(input_sequence) + ‘, predicted next of ’ + str(predicted_next) + ’ matches expected’)

else:

print('For sequence, ’ + str(input_sequence) + ', predicted next of ’ + str(predicted_next) + ’ does not match expected next of ’ + str(output_expected))

# Main program

model = train()

predict(model)

sys.exit(0)