How to measure execution time in PyTorch?

tl;dr

The recommended profiling methods are:

torch.profiler:
torch.utils.benchmark:

CPU-only benchmarking

CPU operations are synchronous; you can use any Python runtime profiling method like time.time().

CUDA benchmarking

Using time.time() alone won’t be accurate here; it will report the amount of time used to launch the kernels, but not the actual GPU execution time of the kernel. Passing torch.cuda.synchronize() waits for all tasks in the GPU to complete, thereby providing an accurate measure of time taken to execute.

train() # run all operations once for cuda warm-up
torch.cuda.synchronize() # wait for warm-up to finish

times = []
for e in range(epochs):
    start_epoch = time.time()
    train()
    torch.cuda.synchronize()
    end_epoch = time.time()
    elapsed = end_epoch - start_epoch
    times.append(elapsed)

avg_time = sum(times)/epochs