How to efficiently orthogonalize a matrix like scipy.linalg.orth()?

I don’t know how to orthogonalize a matrix by PyTorch efficiently on CPUs or GPUs?
The function producing a random orthogonal matrix that I want may be implemented by Numpy and Scipy like this:

import torch
import numpy as np
from scipy.linalg import orth

def get_orth_matrix(N):
    m = np.random.randn(N, N)
    m = orth(m)
    return m

if __name__ == '__main__':
    M = get_orth_matrix(3)
    M = torch.Tensor(M)
    print(M.mm(M.t()))   # should be an identity matrix

I found this code works: :joy:

def get_orth_matrix(M, N):
    m = torch.randn(N, N)
    return torch.qr(m)[0].t()[:M]  # orthogonal matrix with shape of (M, N)