Torch.where got 10x slower than normal

I’m pretty sure it’s about my device, probably the cpu. Everything else seems fine, but one day torch.where became really slow.

import torch
print(torch.__version__)
import time
t = torch.randint(0,100,(1000,1000))
time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)

the result is:

1.6.0
torch.where time: 0.041625261306762695

but in my own MacBook Pro, it is much faster:

>>> import torch
... print(torch.__version__)
... import time
... t = torch.randint(0,100,(1000,1000))
... time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
... 

1.6.0
torch.where time: 0.0031599998474121094

Also to mention that the time may vary, but it is always longer than 0.01s:

>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.014759302139282227
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.06390643119812012
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.019058942794799805
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.041625261306762695
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.03616046905517578
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.0942380428314209
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.09639406204223633
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.09022760391235352
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.019794464111328125
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.08239603042602539
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.015592098236083984
>>> time_ = time.time();torch.where(t == 0);print('torch.where time:', time.time() - time_)
(tensor([  0,   0,   0,  ..., 999, 999, 999]), tensor([ 65, 254, 354,  ..., 937, 971, 984]))
torch.where time: 0.024471759796142578

Any idea about where I should check for the device of the server I’m using? Many thanks.

First things first: using time.time on a single run is not a good benchmark. You’re better off using built-in functions that are meant for this kind of thing. Have a look at the timeit module.

Second, it is not really clear what you are comparing here. You say that things “got slower than usual” but then compare with a Macbook. So what are you comparing? Different versions of Pytorch on the macbook? Different computers altogether? Please give more information.

Apologies for not being clear. I’m comparing the running speed of the server I’m using with the normal speed of this line of code, where I was assuming that the speed on my MacBook is the “normal speed”. Sorry for that. But I do believe that 0.01s is unusual, probably 10x slower than any other normally-working device. As I said, one day it suddenly need 10x more time to execute my code. But I couldn’t figure out what went wrong.

Thank you for your advice of using timeit. I tried and here are the results:
On my slow server;

>>> timeit.timeit('import torch;torch.where(torch.randint(0,100,(1000,1000)) == 0)', number=100)
18.391881837975234

On my Macbook:

timeit.timeit('import torch;torch.where(torch.randint(0,100,(1000,1000)) == 0)', number=100)
1.2651853299998947

Although it is true that in single threaded mode, regular CPUs are often faster (higher clock speeds), the difference in your example is very large. I tested locally (i7-8700k @ 4.5GHz; 1.01s) and on our server (Xeon Gold 6142 CPU @ 2.60GHz; 2.16s) and the difference is not at all that noticeable. Some things you can check:

  • same version of PyTorch and Python (should not matter, but might indicate a version bug)
  • make sure the server is not busy (check htop for instance)
  • if you system is doing a lot IO (reading/writing) at the time that you run the benchmark, that can also negatively impact your CPU throughput