Time series and LSTM model

Hi dear @ptrblck
I confused, I have a CSV dataset that has 47 columns, and I want to predict glucose after 30 minutes. every glucose level measures for 5 minutes by CGM. Only the glucose change with time and others are static. now I don’t know what should I do for my model. can you help me please?
Thank regards

this is the schema of my data

I don’t know which model architecture would work the best for your use case, unfortunately.
Also, I would recommend avoid tagging specific users as it could demotivate others to post a valid response and you might tag someone who might know have a good answer as in this case.

This data isn’t that dissimilar from the Titanic dataset, with the exception of the time series of glucose levels. Here is what I would try:

  1. Separate your inputs by category and normalize the data between 0 and 1. For glucose, you may just want to set the maximum to whatever the highest recorded is. Minimum can be zero. So you simply have norm_value= (value)/(maximum) in that case. Normalize the glucose targets with the same maximum.
  2. The glucose levels, keep them in a sequential format, separated from the other data. Then pass those into a 1d Conv net.
  3. You might find adding a positional encoder to the glocuse data before input will help, if you have varying amounts of glocuse readings in the data(i.e. one is 60 minutes of prior data while another is only 30 minutes, etc.). But the encoding should be done such that the time you’re predicting, that is the target, is fixed to zero. Positional Encoder example. This probably won’t give much benefit, though, if every glucose input sequence is the same length.
  4. If the glucose data is variable in the number of data points, you can include an nn.AdaptiveAvg1d layer to the end of the Conv net.
  5. Reshape the Conv net outputs and attach them to the rest of the data with torch.cat().
  6. Put the data from 5 into a fully connected neural network, with the final layer giving an output of size 1.
  7. Use nn.L1loss() on the output and target.
  8. An optim.Adam() would probably be sufficient for backprop.

Thanks for your help,
But can you explain how can I save my output of cnn and combine with other data? I confused a little
Thank regards
@J_Johnson

Here is an overly simplified code example:

import torch
import torch.nn as nn

class CNN_FC_Model(nn.Module):
    def __init__(self, nonseq_channels, seq_channels=1, hidden_size=64, output_size=1):
        super(CNN_FC_Model, self).__init__()
        #CNN branch, add more layers before avgpool as needed
        self.cnn1 = nn.Sequential(nn.Conv1d(in_channels=seq_channels, out_channels=hidden_size, kernel_size=(3,)),
                                 nn.MaxPool1d(kernel_size=(2,), stride=(2,)), nn.ReLU())
        self.avgpool=nn.AdaptiveAvgPool1d(output_size=1)

        # fully connected branch
        self.relu = nn.ReLU()
        self.dropout=nn.Dropout(p=0.3)
        self.fc1 = nn.Linear(nonseq_channels, hidden_size)
        self.fc2 = nn.Linear(hidden_size*2, hidden_size)
        self.fc3 = nn.Linear(hidden_size, output_size)

    def forward(self, seq_data, nonseq_data):
        #run sequential data through CNN
        seq_data = self.cnn1(seq_data)
        seq_data = self.avgpool(seq_data).flatten(1)
        
        #run non-sequential data through first fc1 layer
        nonseq_data = self.relu(self.dropout(self.fc1(nonseq_data)))

        #combine the data and continue through the fully connected layers
        all_data = torch.cat([seq_data, nonseq_data], dim=1)
        all_data = self.relu(self.dropout(self.fc2(all_data)))
        all_data = self.fc3(all_data)
        return all_data

model=CNN_FC_Model(20)

nonseq_data = torch.rand((32, 20)) #batch_size, miscellaneous data points
seq_data = torch.rand((32, 1, 10)) #batch_size, number of channels, sequence length

out = model(seq_data, nonseq_data)

print(out.size())

You should probably include more cnn layers, unless the sequence isn’t very long.