Hello All,
I am using google TFT model for multi-step timeseries forecasting . However when fitting my trainer object :
I get the following error : index out of range
Does anyone know what is the problem?
Thanks in advance
Hello All,
I am using google TFT model for multi-step timeseries forecasting . However when fitting my trainer object :
I get the following error : index out of range
Does anyone know what is the problem?
Thanks in advance
You are not giving enough information to give you any valid suggestion besides that apparently an indexing operation fails.
Check the stacktraces to see where the error is coming from.
Yes, in fact I am sorry I didn’t provide more informations … I have already tried to check the stacktraces to see where the error is coming from.
IndexError Traceback (most recent call last)
Cell In[326], line 2
1 # fit network
----> 2 trainer.fit(
3 tft,
4 train_dataloaders=train_dataloader,
5 val_dataloaders=val_dataloader,
6 )
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\trainer\trainer.py:529, in Trainer.fit(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
527 model = _maybe_unwrap_optimized(model)
528 self.strategy._lightning_module = model
--> 529 call._call_and_handle_interrupt(
530 self, self._fit_impl, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path
531 )
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\trainer\call.py:42, in _call_and_handle_interrupt(trainer, trainer_fn, *args, **kwargs)
40 if trainer.strategy.launcher is not None:
41 return trainer.strategy.launcher.launch(trainer_fn, *args, trainer=trainer, **kwargs)
---> 42 return trainer_fn(*args, **kwargs)
44 except _TunerExitException:
45 _call_teardown_hook(trainer)
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\trainer\trainer.py:568, in Trainer._fit_impl(self, model, train_dataloaders, val_dataloaders, datamodule, ckpt_path)
558 self._data_connector.attach_data(
559 model, train_dataloaders=train_dataloaders, val_dataloaders=val_dataloaders, datamodule=datamodule
560 )
562 ckpt_path = self._checkpoint_connector._select_ckpt_path(
563 self.state.fn,
564 ckpt_path,
565 model_provided=True,
566 model_connected=self.lightning_module is not None,
567 )
--> 568 self._run(model, ckpt_path=ckpt_path)
570 assert self.state.stopped
571 self.training = False
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\trainer\trainer.py:973, in Trainer._run(self, model, ckpt_path)
968 self._signal_connector.register_signal_handlers()
970 # ----------------------------
971 # RUN THE TRAINER
972 # ----------------------------
--> 973 results = self._run_stage()
975 # ----------------------------
976 # POST-Training CLEAN UP
977 # ----------------------------
978 log.debug(f"{self.__class__.__name__}: trainer tearing down")
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\trainer\trainer.py:1014, in Trainer._run_stage(self)
1012 if self.training:
1013 with isolate_rng():
-> 1014 self._run_sanity_check()
1015 with torch.autograd.set_detect_anomaly(self._detect_anomaly):
1016 self.fit_loop.run()
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\trainer\trainer.py:1043, in Trainer._run_sanity_check(self)
1040 call._call_callback_hooks(self, "on_sanity_check_start")
1042 # run eval step
-> 1043 val_loop.run()
1045 call._call_callback_hooks(self, "on_sanity_check_end")
1047 # reset logger connector
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\loops\utilities.py:177, in _no_grad_context.<locals>._decorator(self, *args, **kwargs)
175 context_manager = torch.no_grad
176 with context_manager():
--> 177 return loop_run(self, *args, **kwargs)
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\loops\evaluation_loop.py:122, in _EvaluationLoop.run(self)
120 self._restarting = False
121 self._store_dataloader_outputs()
--> 122 return self.on_run_end()
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\loops\evaluation_loop.py:244, in _EvaluationLoop.on_run_end(self)
241 self.trainer._logger_connector._evaluation_epoch_end()
243 # hook
--> 244 self._on_evaluation_epoch_end()
246 logged_outputs, self._logged_outputs = self._logged_outputs, [] # free memory
247 # include any logged outputs on epoch_end
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\loops\evaluation_loop.py:326, in _EvaluationLoop._on_evaluation_epoch_end(self)
324 hook_name = "on_test_epoch_end" if trainer.testing else "on_validation_epoch_end"
325 call._call_callback_hooks(trainer, hook_name)
--> 326 call._call_lightning_module_hook(trainer, hook_name)
328 trainer._logger_connector.on_epoch_end()
File ~\AppData\Local\anaconda3\lib\site-packages\lightning\pytorch\trainer\call.py:144, in _call_lightning_module_hook(trainer, hook_name, pl_module, *args, **kwargs)
141 pl_module._current_fx_name = hook_name
143 with trainer.profiler.profile(f"[LightningModule]{pl_module.__class__.__name__}.{hook_name}"):
--> 144 output = fn(*args, **kwargs)
146 # restore current_fx when nested context
147 pl_module._current_fx_name = prev_fx_name
File ~\AppData\Local\anaconda3\lib\site-packages\pytorch_forecasting\models\base_model.py:636, in BaseModel.on_validation_epoch_end(self)
635 def on_validation_epoch_end(self):
--> 636 self.on_epoch_end(self.validation_step_outputs)
637 self.validation_step_outputs.clear()
File ~\AppData\Local\anaconda3\lib\site-packages\pytorch_forecasting\models\temporal_fusion_transformer\__init__.py:539, in TemporalFusionTransformer.on_epoch_end(self, outputs)
535 """
536 run at epoch end for training or validation
537 """
538 if self.log_interval > 0 and not self.training:
--> 539 self.log_interpretation(outputs)
File ~\AppData\Local\anaconda3\lib\site-packages\pytorch_forecasting\models\temporal_fusion_transformer\__init__.py:796, in TemporalFusionTransformer.log_interpretation(self, outputs)
789 """
790 Log interpretation metrics to tensorboard.
791 """
792 # extract interpretations
793 interpretation = {
794 # use padded_stack because decoder length histogram can be of different length
795 name: padded_stack([x["interpretation"][name].detach() for x in outputs], side="right", value=0).sum(0)
--> 796 for name in outputs[0]["interpretation"].keys()
797 }
798 # normalize attention with length histogram squared to account for: 1. zeros in attention and
799 # 2. higher attention due to less values
800 attention_occurances = interpretation["encoder_length_histogram"][1:].flip(0).float().cumsum(0)
IndexError: list index out of range
Also, I think It will be helpful to see how did I define my validation_dataloader which causes the problem I think :
training = TimeSeriesDataSet(
data=data_PA6[lambda x: x.time_idx <= training_cutoff],
group_ids=['group'],
time_idx="time_idx",
target = "best_price_compound",
min_encoder_length= 1, # keep encoder length long (as it is in the validation set)
max_encoder_length=3,
min_prediction_length=6,
max_prediction_length=6,
time_varying_unknown_reals=['best_price_compound'],
predict_mode=False,
)
# create validation set (predict=True) which means to predict the last max_prediction_length points in time
# for each series
validation = TimeSeriesDataSet.from_dataset(training, data_PA6, predict=True, stop_randomization=True)
# create dataloaders for model
batch_size = 32 # set this between 32 to 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=0, shuffle=False)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=0, shuffle=False)
and here is my trainer object :
# configure network and trainer
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=10, verbose=False, mode="min")
lr_logger = LearningRateMonitor() # log the learning rate
logger = TensorBoardLogger("lightning_logs") # logging results to a tensorboard
trainer = pl.Trainer(
max_epochs=100,
accelerator="cpu",
enable_model_summary=True,
gradient_clip_val=0.1,
limit_train_batches=10, # coment in for training, running valiation every 30 batches
# fast_dev_run=True, # comment in to check that networkor dataset has no serious bugs
callbacks=[lr_logger, early_stop_callback],
logger=logger,
#num_sanity_val_steps=3,
)```
Thanks in advance for the help
from the traceback : we see that output is empty (thus output[0] return list index out of range), also output = self.validation_step_outputs … but how does this help ??
After debugging, I think the problem comes from validation_step which is not called, since it calls on_validation_epoch directly :
def on_validation_epoch_end(self):
self.on_epoch_end(self.validation_step_outputs)
self.validation_step_outputs.clear()
here I get self.validation_step_outputs empty list as mentioned before.
while this function is responsible of adding the val_loss into the list I think
def validation_step(self, batch, batch_idx):
x, y = batch
log, out = self.step(x, y, batch_idx)
log.update(self.create_log(x, y, out, batch_idx))
self.validation_step_outputs.append(log)
return log
Does it sound correct ? and why is it not called then ? why does the list stay empty ?
I don’t know, but it sounds more like a lightning-related question as I’m not deeply familiar with this higher-level API.
@El-Hassan-Hajbi, this is pytorch forecasting, can you share how you define your data, I am talking about TimeSerieDataset part. There are a couple of things that potentially could be wrong.
training = TimeSeriesDataSet(
data=data_PA6[lambda x: x.time_idx <= training_cutoff],
group_ids=['group'],
time_idx="time_idx",
target = "best_price_compound",
min_encoder_length= 1, # keep encoder length long (as it is in the validation set)
max_encoder_length=3,
min_prediction_length=6,
max_prediction_length=6,
time_varying_unknown_reals=['best_price_compound'],
predict_mode=False,
)
# create validation set (predict=True) which means to predict the last max_prediction_length points in time
# for each series
validation = TimeSeriesDataSet.from_dataset(training, data_PA6, predict=True, stop_randomization=True)
Add this in your TimeSeriesDataSet:
add_relative_time_idx = True
Let me know if this solves your problem.
Thanks ! It works
I am interested on knowing more about how it solved the problem
Yes, TFT is based on transformer model, which means encoder-decoder with attention. Now you have some features for encoder but you need to have the same for decoder, and if the relative_time_idx is set to False, then list/dataframe of output is empty and you are trying to access its elements, hence the error message.
Okayy Thanks. Also, I didn’t understand when to set add_encoder_length = True