I’ve seen two benchmarking approaches. One using cuda events and the other using autograd profiler (which is more explicit and I believe uses cuda events under the hood).
Specifically…
CUDA Events approach:
# ^warm up code
start = torch.cuda.Event(enable_timing=True)
end = torch.cuda.Event(enable_timing=True)
start.record()
out = model(input)
end.record()
torch.cuda.synchronize()
print(start.elapsed_time(end))
Autograd Profiler approach:
with torch.autograd.profiler.profile(use_cuda=True) as prof:
out = model(input)
total_gpu_time = # get "CUDA time total" from prof
print(total_gpu_time)
What is the difference between the two approaches mentioned above?