Tensorboard and HTA not loading TRACE FILES

Hello,

I want to trace my model. Started with the Profiling PyTorch Tutorials:

1 Step: trace file is saved in the correct folder.

with torch.profiler.profile(
        schedule=torch.profiler.schedule(wait=1, warmup=1, active=3, repeat=1), #schedule the profiler to start after 1 step, warmup for 1 step, run for 3 steps and repeat 1 time
        on_trace_ready= torch.profiler.tensorboard_trace_handler('./traces'), #save the trace to tensorboard by using a tracer object
        record_shapes=True,
        profile_memory=True,
        with_stack=True
) as prof:
    for step, batch_data in enumerate(train_loader):
        prof.step()  # Need to call this at each step to notify profiler of steps' boundary.
        if step >= 1 + 1 + 3:
            break
        train(batch_data)

2 Step (ERROR 1): Displaying the trace file (json) in tensorboard gives the first error. I tried to look in forums which for similar error and found but it did not solve my problem. I tried several paths and can exclude this cause.

3 Step (ERROR 2): Reading the file (json) with holistic trace analysis.

2024-08-07 18:05:24,170 - hta - trace.py:L389 - INFO - C:/????/?????/??????/PyTorch/log/resnet18/
2024-08-07 18:05:24,307 - hta - trace_file.py:L61 - ERROR - If the trace file does not have the rank specified in it, then add the following snippet key to the json files to use HTA; "distributedInfo": {"rank": 0}. If there are multiple traces files, then each file should have a unique rank value.
2024-08-07 18:05:24,447 - hta - trace_file.py:L61 - ERROR - If the trace file does not have the rank specified in it, then add the following snippet key to the json files to use HTA; "distributedInfo": {"rank": 0}. If there are multiple traces files, then each file should have a unique rank value.
2024-08-07 18:05:24,448 - hta - trace_file.py:L92 - WARNING - There is no item in the rank to trace file map.
2024-08-07 18:05:24,448 - hta - trace.py:L535 - INFO - ranks=[]
2024-08-07 18:05:24,449 - hta - trace.py:L541 - ERROR - The list of ranks to be parsed is empty.

Questions:

  1. Why is the rank missing? Is this the right file?
  2. I saw multiple notation that tensorboard is deprecated and hta is now prefered? Is there a difference between their trace files? Do I understand something wrong? (In the hta documentation it refers to the same code I have used.

Thank you in advance.

I solved the problem with HTA.

The case is that HTA is more for distributed jobs starting with a simple example is not covered. The rank needs to be specified manually if it is only run on one GPU: ERROR - If the trace file does not have the rank specified in it, then add the following snippet key to the json files to use HTA; "distributedInfo": {"rank": 0}. If there are multiple traces files, then each file should have a unique rank value.. Which makes it possible to read the file.

Still I am always a reluctant to change something manually in a computer created file and follow up code is not working if the job is not distributed.

PS: This note in the " PyTorch Profiler With TensorBoard" is super confusing

Note
TensorBoard Plugin support has been deprecated, so some of these functions may not work as previously. Please take a look at the replacement, HTA.