Matrix Exponential FP16 Support? Fixed order approxmation?

Hi all,

I am having problems with the memory consumption in the matrix_exp implementation in PyTorch. To my understanding, the underlying algorithm depends on matrix multiplies. Is there a reason these could not be implemented in FP16?

Also, the matrix_exp seems to dynamically select which order approximation to apply. This makes the memory consumption unpredictable (in some cases leading to a significant memory spike, causing CUDA to crash!). I see that there is some planning for fixing the order of approximation: pytorch/LinearAlgebra.cpp at bd7e99cbb9e7980b89c26ae3fa5596f6e4aaebc4 · pytorch/pytorch · GitHub. I am wondering if the only way to change this would be to build from source?