I use torch.profiler to profile the training time of each function in megatron-lm . But it errors: RuntimeError: Can’t disable Kineto profiler when it’s not running.
Here’s the code I use it:
torch.cuda.synchronize()
with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
with record_function("model_fwd"):
output_tensor = model(tokens, position_ids, attention_mask,
labels=labels)
print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
torch.cuda.synchronize()
And here’s the error:
[rank12]: with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
[rank12]: File "/opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py", line 748, in __exit__
[rank12]: self.stop()
[rank12]: File "/opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py", line 764, in stop
[rank12]: self._transit_action(self.current_action, None)
[rank12]: File "/opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py", line 793, in _transit_action
[rank12]: action()
[rank12]: File "/opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py", line 212, in stop_trace
[rank12]: self.profiler.__exit__(None, None, None)
[rank12]: File "/opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py", line 359, in __exit__
[rank12]: self.kineto_results = _disable_profiler()
[rank12]: RuntimeError: Can't disable Kineto profiler when it's not running