Calculating the psudoinverse quickly and accurately

I am interested in calculating the psudoinverse, X^p = X^T (X \cdot X^T)^{-1}, for a relatively large matrix X (ideally up to a size of (10000, 60000)). Unfortunately, I am finding this to either be (a) too slow or (b) imprecise for standard implementations. The following test script highlights the problem for a much smaller matrix.

import torch as torch
import time

def method_1(X):  # manual method
    XXt = X @ X.t()
    return XXt @ torch.inverse(XXt)

def method_2(X):  # pinv method
    return X @ torch.linalg.pinv(X)

def method_3(X):  # solve method
    return X @ torch.linalg.solve(X @ X.t(), X).t()

options = {'manual method': method_1,
           'pinv method': method_2,
           'solve method': method_3}

X = torch.randn((5000, 7500))
identity = torch.eye(5000)

for device in ['cpu', 'cuda']:
  print(f'Device: {device}')
  X =; identity =
  for name, method in options.items():
      start = time.time()
      soln = method(X)
      end = time.time()
      correctness = torch.allclose(soln, identity, rtol=0, atol=1e-5)
      tot_time = end - start 
      print(f'Method: {name}, Correct: {correctness}, Time: {tot_time}' )

Which outputs the following using Google Colab (Tesla T4 for the GPU)

Device: cpu
Method: manual method, Correct: True, Time: 6.604369878768921
Method: pinv method, Correct: True, Time: 37.75372791290283
Method: solve method, Correct: True, Time: 9.509742498397827
Device: cuda
Method: manual method, Correct: False, Time: 0.31755805015563965
Method: pinv method, Correct: False, Time: 19.725148677825928
Method: solve method, Correct: False, Time: 0.3164680004119873

Is there a way in which I could calculate this accurately and in a reasonable time for matrix X of size (10000, 60000)?