Input Data for Time Series Classification (Need Help)


Full disclosure upfront, I am new to machine learning and PyTorch.

I would like to create a neural network using PyTorch that takes time series data as input and classifies the wave shape as output. I am not sure, but I think this would be referred to as Time Series Classification.

Anyways, suppose that there are only three such possible wave shapes possible, either a,

  • Sine wave
  • Triangle wave
  • Square wave

More specifically, the time series data input to the neural network would consist of three individual waveforms, all with the same wave shape but with varying phases.

To make things clearer, it might be helpful if we consider an example input in which the sine wave wave shape is used. In that case, the times series data input to the neural network would look like this,

Note that the time series data for each of the waveforms will all contain the same number of samples. In the example shown above, there are 100 data points for each waveform, for a total of 300 data points for this single time series data input.

I’ve generated several of sets of time series data using the three waveform shapes mentioned previously and concatenated them together into one large pandas data frame which I’ve called X_train. I also have another file called y_train which contains a table with the associated series id and label for each set of time series data contained in X_train.

Just to use some numbers, suppose I have 10 sets of time series data, where each set has a total of 300 data points as mentioned previously.

My question is, how should I structure the X and y training data so that I can use them to train my neural network in PyTorch? My guess is that I would need to convert the data into tensors, but I am not sure how to go about doing that or what dimension tensor I need to use. Is there a function to convert pandas data frames into tensors for PyTorch? Do I need to make any modifications to the dataframe before conversion?

Thanks for all your help!

Since pandas.DataFrame uses numpy internally, you could create PyTorch tensors via tensor = torch.from_numpy(df[...]) (and call .clone() additionally if you don’t want to share the underlying numpy array).
The input shape depends on the used module and the majority of modules expect an input in the shape [batch_size, *], where * denotes additional dimensions. You can check the docs to see the expected shape input. In case you want to use RNNs, make sure to check the shapes, as the default setup is to use the sequential dimension in dim0.