Constant val_loss in TimeSeriesDataSet using categorical_encoders

Alex_Pearson · December 11, 2023, 7:08pm

Hello everyone, I’m encountering a peculiar issue with my TimeSeriesDataSet in PyTorch Forecasting. Normally, my model trains well without categorical_encoders, showing a reduction in validation loss from around 9.5 to 3.3 after 100 epochs (this is for chunk 1 for example sake). However, when I introduce categorical_encoders for my group_ids, the validation loss frustratingly sticks at 36.67 for all epochs. This is a major concern as I need these encoders to handle ‘cold start’ issues in real-world data. Here’s a snippet of my implementation for reference:

chunk_size = 200000
total_rows = 184902668
while start_row < total_rows:
pl.seed_everything(415)
max_prediction_length = 36
min_encoder_length = 36
max_encoder_length = 1240
training_cutoff = data[“time_idx”].max() - max_prediction_length
training = TimeSeriesDataSet(
data[lambda x: x.time_idx <= training_cutoff],
group_ids=[“cat_col_1”, “cat_col_2”, “cat_col_3”, “cat_col_4”],
categorical_encoders={
‘cat_col_1’: pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
‘cat_col_2’: pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
‘cat_col_3’: pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
‘cat_col_4’: pytorch_forecasting.data.encoders.NaNLabelEncoder(add_nan=True),
},
)

Has anyone faced a similar challenge or have insights on what might be causing this constant validation loss? Any advice or solutions would be greatly appreciated!