If you have algebraic structure such as s.p.d. matrices, it might beworth doing the cholesky and take (twice) the logsum of the diagonal. This can be more efficient than the sowhat elaborate logdet backward.

The original PR introducing logdet has some discussion around the reasoning, but I must admit I don’t fully follow it (and maybe things have changed since).

Note that your benchmarking is wrong for CUDA: You need to torch.cuda.synchronitze() before taking times (both start and end).