I’m trying to replicate torch.linalg.solve in a project where I’m not able to depend on libtorch, and so I’m trying to figure out exactly what it does (and the many layers of abstraction in the C++ code have been hard to navigate). Specifically, I’m interested to know what it’s doing for the CPU and/or cuSOLVER cases. Is it just calling *getrf
to get the LU factorization of A and *getrs
to solve? When I profile the call, I see a lot of time spent in triangular_solve_cublas (in the cuSOLVER case), which calls directly into *trsm
, which seems unrelated. Does anyone have any guesses?