Weird function run time

I got a weird function run time in my code:

start1 = time.time()
func1()
end1 = time.time()
print(end1-start1)

start2 = time.time()
func2()
end2 = time.time()
print(end2-start2)

I try my best to optimize my code. First of all, the run time of func1 is about 0.012, and func2 is 0.0008.
I rewrite func1 by some powerful torch api and the run time of func1 is about 8*10^-5, but the run time of func2 is about 0.012…
I don’t change any thing in func2.
Why???
Thanks for anyone’s reply!

Are the functions using some operations of the GPU?
Since CUDA calls are asynchronous, you should synchronize before calculating stopping the timer.

...
func1()
torch.cuda.synchronize()
end1 = time.time()
...

Thank u! I think I find the real slower function!