Will profiler.record_function be affected by the asynchronous execution?

In the doc of pytorch profiler, it suggests that you can wrap each line of your code into a context manager to find the bottleneck of your code.

However, I’m wondering whether the asynchronous nature of cuda will cause inaccurate measurement in this case? Should we call torch.cuda.synchronize() before entering and leaving the context?

Should we call torch.cuda.synchronize() before entering and leaving the context?

You should be fine just to execute the code as usual