I am interested in calculating the psudoinverse, X^p = X^T (X \cdot X^T)^{-1}, for a relatively large matrix X (ideally up to a size of (10000, 60000)). Unfortunately, I am finding this to either be (a) too slow or (b) imprecise for standard implementations. The following test script highlights the problem for a much smaller matrix.

```
import torch as torch
import time
def method_1(X): # manual method
XXt = X @ X.t()
return XXt @ torch.inverse(XXt)
def method_2(X): # pinv method
return X @ torch.linalg.pinv(X)
def method_3(X): # solve method
return X @ torch.linalg.solve(X @ X.t(), X).t()
options = {'manual method': method_1,
'pinv method': method_2,
'solve method': method_3}
torch.manual_seed(0)
X = torch.randn((5000, 7500))
identity = torch.eye(5000)
for device in ['cpu', 'cuda']:
print(f'Device: {device}')
X = X.to(device); identity = identity.to(device)
for name, method in options.items():
start = time.time()
soln = method(X)
end = time.time()
correctness = torch.allclose(soln, identity, rtol=0, atol=1e-5)
tot_time = end - start
print(f'Method: {name}, Correct: {correctness}, Time: {tot_time}' )
```

Which outputs the following using Google Colab (Tesla T4 for the GPU)

```
Device: cpu
Method: manual method, Correct: True, Time: 6.604369878768921
Method: pinv method, Correct: True, Time: 37.75372791290283
Method: solve method, Correct: True, Time: 9.509742498397827
Device: cuda
Method: manual method, Correct: False, Time: 0.31755805015563965
Method: pinv method, Correct: False, Time: 19.725148677825928
Method: solve method, Correct: False, Time: 0.3164680004119873
```

Is there a way in which I could calculate this accurately and in a reasonable time for matrix X of size (10000, 60000)?