Hey Folks, I was trying to time-test the benefit of matrix factorization and have gotten wildly different results across machines.
import timeit import torch x = torch.rand(1000) w = torch.rand([5000, 1000]) a = torch.rand([5000, 100]) b = torch.rand([100, 1000]) print("wx ", timeit.timeit(lambda: w@x, number=10000)) print("a(bx)", timeit.timeit(lambda: a@(b@x), number=10000))
On machine A:
wx 1.0115966480225325 a(bx) 0.8654864309355617
On machine B:
wx 2.858995377959218 a(bx) 0.16533887601690367
On machine C:
wx 2.5966870239935815 a(bx) 2.271036416757852
On machine D:
wx 1.2024586703628302 a(bx) 4.790976291522384
These results are consistent and reproducible per machine.
I care about the relative timing per machine, not the absolute. One machine might be faster, but should still scale relative to the size of the matrix.
The measurements are not affected by pytorch version, having tried 1.5.0 and 1.12.1
And the dtype is float32 and device is cpu, across the machines.
Any idea why this is so machine dependent?
Thanks in advance!