Making Predictions with Time Series Classification model


I am a total beginner with pytorch and machine learning in general.

I trained a time series classification model for predicting if a mountainbiker is in the air (jumping) or not, based on X-,Y- and Z - acceleration data of the biker.

I used a tutorial from Venelin Valkov as a template, where he predicted the surfaces robots where standing on, based on acceleration data from the robots.

As i followed his instructions with my own dataset, i am pretty confident, that my model makes sense and looking at the results and a confusion matrix it performs pretty well.

Now i want to evaluate the model by predicting the airtime of whole runs of a biker. Every run was filmed, so i could compare the predicted airtime with the ground truth from the video.
So i would like to upload a csv file containing the data of one run and try my model on it, to get predictions of every sample if its “Airtime” or “No Airtime”.

I have converted the csv to a tensor and i know that I have to use model.eval and torch.no_grad for inferences, but up to now I did not succeed.

So my question is, if it is possible to run my model on a single csv file (not a whole dataset as in training) and make predictions and what is the easiest way to do so?

It is of course possible to make individual predictions with your trained model, but how exactly you should do this depends, for example, on your used data representation and model architecture.

I assume your tensor (lets call it x) is of shape [B,F], where each row represents one sample you want to classify (so B samples in total).
Then you should be able to make a prediction for each of the samples in your tensor with

predictions = model(x)

Please let me know if I misunderstood your problem. If so, maybe you could provide some of your code to make the question more clear.

Thanks for the quick response.

The input needed for my model to make predictions seems to be of shape [batch size, length, features], so i expanded my data to a 3d tensor

df_test_conts = test_data[['x_test','y_test','z_test']]
test_conts = df_test_conts.astype(np.float32)
test_conts = torch.tensor(test_conts.values).cuda()
test_conts = test_conts.transpose(1, 0)
test_conts = test_conts.transpose(2, 0)
test_conts = test_conts.expand(1,2817,3)

So i end up with a tensor with one batch and length of 2817 (test.csv has 2817 samples).

Then i tried to make predictions like this:

# make predictions
with torch.no_grad():
  prediction = model(test_conts)

But the prediction i receive is now a tuple, not as expected a tensor, and looks like this.

(0, tensor([[-5.3364,  4.6702]], device='cuda:0'))

I tried then to convert the prediction back to a pandas dataframe, but this is not possible with a tuple. I am also not quite sure if the prediction did what i wanted it to do, which is to predict every sample if it is “air” or “noair”.

Sorry if my I am making stupid mistakes or explaining badly, but I am really new to this topic.

I think it would be helpful if you could provide the implementation of your model.
Moreover I am not quite sure what you are doing here:

In my opinion it seems that you could replace that by simply doing:


Could you also provide the shape of test_conts after this line of code?

Yeah, you are right, this does the same thing, my bad.

This is the shape of test_conts after adding the fake batch dimension:

torch.Size([1, 2817, 3])

Here is the Implementation of my model:

class SequenceModel(nn.Module):

  def __init__(self, n_features, n_classes, n_hidden=256, n_layers=3):

    self.lstm = nn.LSTM(
        input_size = n_features,
        hidden_size = n_hidden,
        num_layers = n_layers,
        batch_first = True,
        dropout = 0.75


    self.classifier = nn.Linear(n_hidden, n_classes)

  def forward(self,x):
    _, (hidden,_) = self.lstm(x)

    out = hidden [-1]
    return self.classifier(out)

class StatePredictor(pl.LightningModule):

  def __init__(self, n_features: int, n_classes: int):
    self.model = SequenceModel(n_features, n_classes)
    self.criterion = nn.CrossEntropyLoss()

  def forward (self, x, labels = None):
    output = self.model(x)
    loss = 0
    if labels is not None:
      loss = self.criterion(output, labels)
    return loss, output

  def training_step(self, batch, batch_idx):
    sequences = batch["sequence"]
    labels = batch["label"]
    loss, outputs = self(sequences, labels)
    predictions = torch.argmax(outputs, dim=1)
    step_accuracy = accuracy(predictions, labels)

    self.log("train_loss", loss, prog_bar=True, logger=True)
    self.log("train_accuracy", step_accuracy, prog_bar=True, logger=True)
    return {"loss": loss, "accuracy": step_accuracy}

  def validation_step(self, batch, batch_idx):
    sequences = batch["sequence"]
    labels = batch["label"]
    loss, outputs = self(sequences, labels)
    predictions = torch.argmax(outputs, dim=1)
    step_accuracy = accuracy(predictions, labels)

    self.log("val_loss", loss, prog_bar=True, logger=True)
    self.log("val_accuracy", step_accuracy, prog_bar=True, logger=True)
    return {"loss": loss, "accuracy": step_accuracy}
  def test_step(self, batch, batch_idx):
    sequences = batch["sequence"]
    labels = batch["label"]
    loss, outputs = self(sequences, labels)
    predictions = torch.argmax(outputs, dim=1)
    step_accuracy = accuracy(predictions, labels)

    self.log("test_loss", loss, prog_bar=True, logger=True)
    self.log("test_accuracy", step_accuracy, prog_bar=True, logger=True)
    return {"loss": loss, "accuracy": step_accuracy}

  def configure_optimizers(self):
    return optim.Adam(self.parameters(), lr=0.0001)

model = StatePredictor(

The model implementation looks fine to me.
If I run your code with

test_conts = torch.rand((1, 2817, 3))
model = SequenceModel(3, 2, 256, 3)
prediction = model(test_conts)

I receive a prediction of shape [1,2], so everything seems to work fine here.

Given you use the model implementation above, your model shouldn’t be able to return a tuple at all.
Maybe you did change your model implementation in the meantime?

I get the same shape with this code, so no i don´t think i changed my model implementation.

I think the cause of this Output

was that in training i had Sequences with each 30 samples. Within a sequence all samples were classified 0 or 1, respectively “No Air” or “Air”. And the model tries to find out which sequences of my “Training-Dataset” are 0 or 1, which worked pretty good.

So when feeding new unseen data with X samples and no sequencing, apparently the model tries to find out if the data as a whole is 0 or 1 (0 in this case). But i want my model to predict the label of every line/sample of the data now and I am not figuring out how to do so. I hope I made my problem more clear with this.

Ok now I see. As far as I understood you trained your model on a sequence2one task i.e. there was ONE label associated with a whole sequence. Now you want to use this trained model in a sequence2sequence task, and to be more precise you want to classify each element in the sequence whether it is „No Air“ or „Air“.
Is that right?

The implementation of your model, that you posted above, is however only suitable for sequence2one tasks. The LSTM cell takes one element of the sequence (x_i) as input at a time, does some calculations based on this input and two „hidden states“ (h_i and c_i), updating the hidden states and generating an output variable (y_i). The updated hidden states are then used together with the next sequence element (x_i+1) as input in the next time step.You can find a illustration of the concept here PyTorch LSTM Cell (taken from this post)

In your implementation you are only taking the final hidden state h_N of the LSTM and pass it to a classifier. This implies the reasonable assumption that the final hidden state contains some information about the whole sequence. The classifier finally predicts „No Air“ or „Air“ from this final hidden state. As you see, this will only work for sequence2one tasks, as your classifier can only predict 1 label for the whole sequence.

If you want to use your LSTM for a sequence2sequence task I am seeing 2 options:

  1. You change your implementation to not only return the last hidden_state h_i but also the generated outputs y_i. You could try using your trained classifier to predict „No Air“ / „Air“ from every y_i, and assign the prediction to the input of this tilmestep (x_i) or even reducing the number of hidden_features to the number of your classes, applying a softmax to the outputs y_i and taking that as predictions. However, I am not sure if this will give you reasonable results.
  2. You classify „No Air“ / „Air“ from every intermediate hidden state „h_i“. However I am not aware of a method to store the hidden_states for all timesteps during forwardpass, so I would assume you would have to write a custom loop over the LSTM cell in order to get these values

I hope I was finally able to solve your question :slight_smile:

1 Like

Thanks a lot for this thorough explanation, now i understand my model a lot better :wink:

I did not succeed with these 2 suggestions, but i managed to do predictions of single runs by sequencing them like the training data set, but making each sample a own sequence. (e.g. 2817 sequences in a file with 2817 samples). I also provided the Y-Data containing the Ground Truth and the corresponding sequence(“sample”)-number. So the same procedure as in training.

The prediction is now working but not as efficient as in training. In training I am receiving an accuracy of 95%. Now with the prediction I am predicting 95% of NOAIR correctly, but only 77% of AIR.

Ground Truth is 224 Samples AIR and 2593 Samples NOAIR

I have tried the following to improve the performance:

  • Changing the number of n_hidden

  • Changing the batch size

  • Changing the dropout

  • Changing the learning rate

But this is the best result i could achieve. Are there any other ways to improve the performance of a LSTM-Model? Or do you think the Training-Dataset is not sufficiently enough for this task (about 50000 samples)?

I am sorry for asking so many questions, but i could not find a solution elsewhere :slight_smile:

If I got it right, you are now making your predictions based on only one time-step of the time-series. Therefore your model is not able to exploit timely dependencies.
If you want to improve your results I would suggest to try one of these options again:

Apart from that the problem that you describe consists in the accuracy on the test set being significantly lower than on the Training set. Lower Test-Accuracy is probably caused by one of the following reasons:

  1. Overfitting on the training data
    You can check for that by using a validation set and overcome it by implementing early stopping during training
  2. your test data has a different underlying distribution than your training data
    Was it generated by another data generation process? This could for example be the case if the time-series was recorded by a bike-rider not in the training set or on a hill that was not present in the training-set. Are you using a single time-series as test-set? Try to increase the test-set size, e.g. use 60% of your data for training, 20% for validation and 20% for testing. That should help making your evaluation more robust.