Looking at ways to do multi-linear regression with PyTorch, since from benchmarking I find that it performs really well for basic matrix multiplication / inversion.

Now, I’m a bit confused with the performance of `pinverse()`

, and wondering about precision also after seeing this thread:

In my benchmarks, I get a multi-linear equation solved with different methods, using 100,000 samples and 150 params:

**Basic matrix inversion** (see code below)

*i7 2.8GHz (8 cores)*

- 96.4 ms ± 1.3 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

*Dual Xeon E5-2650 v2 @ 2.60GHz (32 cores)*

- 176 ms ± 4.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

1.8X factor here between computers, which I find odd to begin with, but that’s not even the main concern

**Moore Penrose pseudo Inverse** version takes:

*i7 2.8GHz (8 cores)*

- 1.87 s ± 64 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

*Dual Xeon E5-2650 v2 @ 2.60GHz (32 cores)*

- 903 ms ± 38.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) on CPU, on a

Here the dual Xeon is noticeably 2x faster

As a matter of comparison, **statsmodel.OLS** (which used a version of Moore Penrose pseudo inverse gets me this:

*i7 2.8GHz (8 cores)*

- 1.89 s ± 129 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) on

*Dual Xeon E5-2650 v2 @ 2.60GHz (32 cores)*

- 2.8 s ± 206 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

OLS and pinverse perform about the same on the i7, but on the Xeon it’s night and day.

What is going on here? It seems due to the hardware difference, but they both has several cores and the timing doesn’t scale with the cores.

PS: timing include the numpy to tensor casting, but I benchmarked that and that is negligeable.

my code:

```
# simple inverse matrix linear equation solving
t = time.time()
n = x.shape[0]
m = x.shape[1]
x1 = np.append(np.ones((n, 1)), x.reshape((n, m)), 1)
tx = torch.from_numpy(x1)
ty = torch.from_numpy(y.reshape((n, 1)))
m = ((tx.transpose(0, 1).matmul(tx)).inverse()).matmul(tx.transpose(0, 1).matmul(ty))
dt = time.time() - t
```

```
# moore penrose pseudo inverse version
t = time.time()
n = x.shape[0]
m = x.shape[1]
x1 = np.append(np.ones((n, 1)), x.reshape((n, m)), 1)
tx = torch.from_numpy(x1)
ty = torch.from_numpy(y)
mpinv = torch.pinverse(tx)
m = torch.tensordot(mpinv, ty, dims=1)
dt = time.time() - t
```