Torch.autograd.profiler doesn't save much

I have a training script that I launch with python -m torch.distributed.launch --nproc_per_node=1 --use_env I would like to profile it, so I did something like this:

import torch.autograd.profiler as profiler
    with profiler.profile() as prof:
        with profiler.record_function("training"):
            print("Start training")
            for epoch in range(epochs):
    if is_main_process():
        prof.export_chrome_trace(output_dir / 'trace.json')

The resulted json contains almost nothing:

[{"name": "training", "ph": "X", "ts": 140.203, "dur": 139.73299999999998, "tid": 1, "pid": "CPU functions", "args": {}}]

What should I do to have the detailed running time of all the operations of my training?

Thank you very much in advance for your help!

The reason lies in this code, you put too much code in this block.

with profiler.record_function("training"):

@hhaoao Thanks. But how do we know how much is too much?

I personally summarized the use of record_function in the following points:

  1. If you think it is redundant, only the code segment with general statistics is needed.
  2. Code snippets that need comment help to view.

Of course, there are more than these uses, you can dig by yourself.