I am training a temporal fusion transformer and getting AssertionError: filters should not remove entries all entries - check encoder/decoder lengths and lags
My Code :
max_encoder_length: int = train.groupby(‘contract_id’).size().max()
min_encoder_length: int = 1
min_prediction_length: int = 1
max_prediction_length: int = 13
batch_size: int = 32
hidden_size: int = study.best_trial.params[‘hidden_size’]
lstm_layers: int = 3
attention_head_size: int = study.best_trial.params[‘attention_head_size’]
dropout: float = study.best_trial.params[‘dropout’]
hidden_continuous_size: int = study.best_trial.params[‘hidden_continuous_size’]
learning_rate: float = study.best_trial.params[‘learning_rate’]
max_epochs: int = 50
random_seed: int = 0
Blockquote
training_dataset = TimeSeriesDataSet(
# df_1[lambda x: x.time_idx <= training_cutoff],
train,
time_idx=‘time_idx’,
target=‘NetRevenue’,
group_ids=[‘contract_id’,‘cluster’,‘Sector’],
min_encoder_length=min_encoder_length,
max_encoder_length=max_encoder_length,
min_prediction_length=min_prediction_length,
max_prediction_length=max_prediction_length,
static_categoricals=[‘contract_id’,‘Sector’,‘cluster’],
static_reals=,
time_varying_known_categoricals=[‘flag_zero_revenue’,‘flag_engagement_near_end_indx’],
time_varying_unknown_categoricals=[‘invoice_raised’,‘Amt_Received’],
time_varying_unknown_reals=[
‘NetRevenue’,
‘GLHours’,‘PTOAdjustment’,
‘WorkingDays’],
time_varying_known_reals=[
‘period_sin’,‘period_cos’,‘time_idx’,‘FiscalYear’,‘Period’,‘weighted_days_to_end’],
target_normalizer=GroupNormalizer(
groups=[‘contract_id’,‘cluster’,‘Sector’],transformation=‘softplus’,center=True
),
add_relative_time_idx=True,
add_target_scales=True,
add_encoder_length=True,
allow_missing_timesteps=True,
categorical_encoders={
'contract_id': NaNLabelEncoder(add_nan=True, warn=False),
'cluster': NaNLabelEncoder(add_nan=True, warn=False),
# 'ProjectTypeDesc': NaNLabelEncoder(add_nan=True, warn=False),
# 'Sector': NaNLabelEncoder(add_nan=True, warn=False),
# 'flag_zero_revenue_in_fiscal_year':NaNLabelEncoder(add_nan=True, warn=False),
# 'NetRevenue': NaNLabelEncoder(add_nan=True, warn=False),
# 'project_start': NaNLabelEncoder(add_nan=True, warn=False),
# 'project_end': NaNLabelEncoder(add_nan=True, warn=False),
# 'fiscal_year': NaNLabelEncoder(add_nan=True, warn=False),
# 'WorkingDays': NaNLabelEncoder(add_nan=True, warn=False),
}
)
validation_dataset = TimeSeriesDataSet.from_dataset(
dataset=training_dataset,
data=val,
stop_randomization=True,
predict=True
# min_prediction_idx = int(training_cutoff + 1)
)
I am getting the assertion error while running code for validation_dataset .
If anyone can help me identify, provide solution and explain , will be a great help .