Why run torch.unique twice get different running time?

Code is:
time1 = time.time()
unique, inv = torch.unique(src, ,sorted=False, return_inverse=True)
time2 = time.time()
src2 = src.clone()
time2 = time.time()
unique, inv = torch.unique(src2, sorted=True, return_inverse=True)
time3 = time.time()
print(time2- time1, time3-time2)

get the time: 0.015712261199951172 0.000576972961425781
It’s very strange that time-consuming seems to have nothing to do with whether it is sorted or not. The time-consuming to run first is always higher than later.

In case you are using the GPU, you would have to synchronize the code before starting and stopping the timers since CUDA operations are executed asynchronously. The better way to profile such workloads would be to use torch.utils.benchmark.

Thanks for your reply! after synchronize the code, the time consuming of two operation are in same magnitude.