Torch.profiler failed to profile in Megatron-LM

I use torch.profiler to profile the training time of each function in megatron-lm . But it errors: RuntimeError: Can’t disable Kineto profiler when it’s not running.
Here’s the code I use it:

    torch.cuda.synchronize() 
    with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
        with record_function("model_fwd"):
            output_tensor = model(tokens, position_ids, attention_mask,
                    labels=labels)
    print(prof.key_averages().table(sort_by="cpu_time_total", row_limit=10))
    torch.cuda.synchronize()

And here’s the error:

[rank12]:     with profile(activities=[ProfilerActivity.CPU], record_shapes=True) as prof:
[rank12]:   File "/opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py", line 748, in __exit__
[rank12]:     self.stop()
[rank12]:   File "/opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py", line 764, in stop
[rank12]:     self._transit_action(self.current_action, None)
[rank12]:   File "/opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py", line 793, in _transit_action
[rank12]:     action()
[rank12]:   File "/opt/conda/lib/python3.10/site-packages/torch/profiler/profiler.py", line 212, in stop_trace
[rank12]:     self.profiler.__exit__(None, None, None)
[rank12]:   File "/opt/conda/lib/python3.10/site-packages/torch/autograd/profiler.py", line 359, in __exit__
[rank12]:     self.kineto_results = _disable_profiler()
[rank12]: RuntimeError: Can't disable Kineto profiler when it's not running