CUDA does not implement the trtrs function for the triangular solver operation. However, through CUBLAS, it provides the trsm and trsv functions, which appear to do exactly the same as trtrs.
As far as I can tell, common CPU BLAS implementations provide the trsm and trsv functions (e.g. OpenBLAS and MKL). MAGMA also provides an implementation of both.
I’m not entirely sure how pytorch manages context handles and callling CUDA code. Would it suffice to just add the trsm and trsv functions to pytorch/torch/lib/TH/generic/THBlas., similar to the way it is done for trtrs in pytorch/torch/lib/TH/generic/THLapack.?
– Juan Camilo