Hi I observe that inverse operation on GPU is slower than CPU
I am not sure if this is the right way to profile but here is what I have done
>>> import time
>>> gpu_tensor = torch.randn(3,3).cuda()
>>> cpu_tensor = torch.randn(3,3)
>>> def test1():
s = time.time()
for i in range(50):
torch.inverse(cpu_tensor)
e = time.time()
print(e - s)
>>> def test2():
s = time.time()
for i in range(50):
torch.inverse(gpu_tensor)
e = time.time()
print(e - s)
>>> test1()
>>> 0.000229120254517
>>> test2()
>>> 0.310909032822
If you are timing cuda ops, you should add a torch.cuda.synchronize() before starting and stopping the timer.
The first cuda call needs some time to init CUDA, thus your timing might measure this time as well.#
Also CUDA ops are called asynchronously, so that your main thread can continue its execution while the GPU is busy.
Could you add it to your code and run it again?
Ok so I added torch.cuda.synchronize() before the timer.
>>> import time
>>> gpu_tensor = torch.randn(3,3).cuda()
>>> torch.cuda.synchronize()
>>> cpu_tensor = torch.randn(3,3)
>>> def test1():
s = time.time()
for i in range(50):
torch.inverse(cpu_tensor)
e = time.time()
print(e - s)
>>> def test2():
s = time.time()
for i in range(50):
torch.inverse(gpu_tensor)
e = time.time()
print(e - s)
>>> test1()
>>> 0.000277042388916
>>> test2()
>>> 0.325435876846