It isn’t. While there are more refined measures, there isn’t anything wrong with plain timing. Apparently there first are lots of people doing it wrong (both beginners and people with considerable experience) and then inaccurate representations of what exactly is wrong (“can’t use time.time” Edit: actually it is true you should not use it but time.perf_counter(!)): The main things to get right is warm-up and synchronization.
The thing is that if you use the GPU, unless you call
torch.cuda.synchronize() before taking the time (for both start and finish), you don’t know what has been executed before and after the time taking.
I invariably use the following pattern:
for _ in range(100): # or 1000 or whatever, depending on how long it takes
Of course, you need to divide the time by whatever size of the loop you have. I usually aim to have something in the msec range or so.
What this does:
This does run the operator (
do_my_computation) multiple times between syncs, this would reduce the influence of the synchronization (which takes time) on the measurement.
do_stuff() before the timing does:
- Warm-up (e.g. some things compile kernels on the fly when called for the first time etc.)
- Synchronize before starting the timing
do_stuff() ensures that synchronization happens after each run (and thus implicitly before the next).
You can do essentially the same thing with
time.time time.perf_counter() before and after what is
%timeit here, except that timeit will actually call
do_stuff several times and do some stats to help you along. There also is the
timeit module which is similar but you need to adjust the number of runs manually to the duration of your computation.
That said, the profiler gives you more detailed information with very little effort.