I am using torch.profiler.profile for profiling my training.
I am sure that prof.step() is properly called; however, duplicated (slightly different though) results are saved. You can see what is happening in the screenshot below.
I have checked my code many times, but could not identify the cause. Can you guess possible cuases?
(Update; I checked pt.trace.json files carefully, and I found that almost the same results were saved by two different processes. For example, in rank0, two processes with pid 395230 and pid 0 wrote the same step records. It seems that the first one (pid=395230) is the process on CPU, and the other may be GPU process, I guess.)
(Picture: the same step numbers appear, such as 'step16, step16, step17, step17,…).