How can I t time series data with different data point

leminhson22 · March 21, 2026, 7:22am

Hello everyone, I’m currently working on a glucose prediction problem for type 1 diabetes patients. However, I’m encountering quite a few issues in the data processing step, so I’d like to ask for your opinions.
The dataset includes 17 people with type 1 diabetes, relatively balanced in gender (10 females, 7 males), with ages ranging from 23 to 70 years old and BMI from 20.3 to 36.5 kg/m². Data was collected continuously for 12 weeks (90 days). My dataset includes many patients (ID type UoM23xx, UoM24xx). Each person has different data types, stored in separate CSV files:
Glucose: continuous measurement (CGM), approximately every 5 minutes
Insulin: Basal (continuous), Bolus (injected with meals)
Nutrition: meal information (carbs, calories…)
Activity: physical activity (~15 minutes)
Sleep: Sleep status, sleep duration
Currently, my data is fragmented into many files (each patient and each data type is a separate file), so I’m wondering whether I should merge all the data into one file (or one large table) or keep the current structure and process it separately for each patient. Also, the data between patients is not synchronized in terms of time. Specifically, the data types for each person are not synchronized. Although the timestamps are the same, even among patients, the measurement times differ (for example, one person measured at 7:30, another at 7:31, or completely different starting times). I would appreciate your advice on how to handle this dataset.

I can summarize some of the issues with my dataset as follows:

Differences in time-step (multi-frequency)
Glucose: every 5 minutes
Activity: every 15 minutes
Nutrition, bolus insulin: event-based (irregular)
Sleep: interval-based
It is very difficult to combine them into a common time series.
Timestamps do not match
Example: glucose at 7:30, activity at 7:45
There is no common timestamp to join directly.
Different collection times among patients
Some patients have data from approximately 3 months ago.
Some patients have data from approximately 2 months ago.
The data is inconsistent and uneven.
Different start times:
Person A: started 23/10/2017 19:30
Person B: 23/10/2017 19:31

aritra_pal · April 3, 2026, 2:08am

Hi, Not super familiar on the medical part but still giving it a try, so please bear with me.

Time stamp & Date - Does date or time of the day matter in predicting the glucose level? If not then probably you can get rid of these parameters, although I have a feeling that the time of the day might be a somewhat important driver.

Difference in timing of various activities - This can probably be overcome by capturing most activity as binary and others as categorical columns. For e.g, physical activity as True/False, similarly Sleep Duration as categorical/bucket column (you probably won’t need a sleep status column then).

Combining patient data - It may make sense to combine all the patient data into one for better, more diverse training. Sampling the data from different patients while creating sub batches may be worthwhile.

It’s an interesting use case and Happy to discuss more on this. Let me know your thoughts.