Clarity on Benchmarking Approaches

I’ve seen two benchmarking approaches. One using cuda events and the other using autograd profiler (which is more explicit and I believe uses cuda events under the hood).

Specifically…
CUDA Events approach:

# ^warm up code

start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)

start.record()
out = model(input)
end.record()
torch.cuda.synchronize()

print(start.elapsed_time(end))

Autograd Profiler approach:

with torch.autograd.profiler.profile(use_cuda=True) as prof:
    out = model(input)

total_gpu_time = # get "CUDA time total" from prof
print(total_gpu_time)

What is the difference between the two approaches mentioned above?