I’m using this code for training an X3D model:
from lightning.pytorch.profilers import PyTorchProfiler
from pytorch_lightning.callbacks import ModelCheckpoint, LearningRateMonitor, StochasticWeightAveraging, BackboneFinetuning
schedule = torch.profiler.schedule(skip_first=20, wait=5, warmup=1, active=5)
profiler = PyTorchProfiler(filename="profile", schedule=schedule)
trainer_kwargs = {
"max_epochs": 2,
"precision": 16,
"deterministic": True,
"benchmark": True,
}
callbacks = list()
callbacks.append(LearningRateMonitor(logging_interval="epoch"))
callbacks.append(ModelCheckpoint(monitor="val/accuracy", mode="max", save_top_k=1))
callbacks.append(GenerateMetricImages())
model_module = _model_init_func(**model_kwargs, **data_kwargs, **trainer_kwargs)
data_module = _dataset_init_func(videos_dir, train_file, val_file, test_file, model_params, **data_kwargs)
trainer = pytorch_lightning.Trainer(
logger=[tb_logger, profiler=profiler, callbacks=callbacks, **trainer_kwargs
)
trainer.fit(classification_module, data_module)
trainer.test(classification_module, data_module)
The training finishes without issues, but when running the trainer.test I get this error:
File "/home/my_user/anaconda3/envs/point_outcome_updated/lib/python3.11/site-packages/torch/profiler/profiler.py", line 191, in events
assert self.profiler
^^^^^^^^^^^^^
AssertionError
Exception ignored in: <function Profiler.__del__ at 0x72cac877eca0>
Traceback (most recent call last):
File "/home/my_user/anaconda3/envs/point_outcome_updated/lib/python3.11/site-packages/lightning/pytorch/profilers/profiler.py", line 147, in __del__
File "/home/my_user/anaconda3/envs/point_outcome_updated/lib/python3.11/site-packages/lightning/pytorch/profilers/pytorch.py", line 561, in teardown
File "/home/my_user/anaconda3/envs/point_outcome_updated/lib/python3.11/site-packages/lightning/pytorch/profilers/pytorch.py", line 545, in _delete_profilers
File "/home/my_user/anaconda3/envs/point_outcome_updated/lib/python3.11/site-packages/lightning/pytorch/profilers/pytorch.py", line 537, in _cache_functions_events
File "/home/my_user/anaconda3/envs/point_outcome_updated/lib/python3.11/site-packages/torch/profiler/profiler.py", line 191, in events
AssertionError:
I found a discussion here, stating that “the training phase seems to kill the pytorch profiler, that therefore doesn’t exist anymore for the test phase”, but it seems the issue should’ve been solved in version 1.5.5, and I’m using pytorch-lightning==2.3.3.
Any suggestions?