I’ve seen two benchmarking approaches. One using cuda events and the other using autograd profiler (which is more explicit and I believe uses cuda events under the hood).
CUDA Events approach:
# ^warm up code start = torch.cuda.Event(enable_timing=True) end = torch.cuda.Event(enable_timing=True) start.record() out = model(input) end.record() torch.cuda.synchronize() print(start.elapsed_time(end))
Autograd Profiler approach:
with torch.autograd.profiler.profile(use_cuda=True) as prof: out = model(input) total_gpu_time = # get "CUDA time total" from prof print(total_gpu_time)
What is the difference between the two approaches mentioned above?