I’m currently testing a variant of the LSTM architecture called subLSTM. I was trying to get an efficient implementation to speed up my tests since my PyTorch implemenation its still very slow compared to the library LSTM. I also tried using TorchScript but its still much slower than the LSTM version. Specifically I used jit.script to compile the inner loop of the RNN (similar to the LSTM implementation). So my questions are:
- Why does this not make it work as fast? Is it a problem with the way I’m using jit.script?
- I was trying to access the RNN implementations in _C._VariableFunctions, but I can’t see it. Is there any way of modifying that code? The modifications I need are trivial, and while I could implement it myself I may not get such an efficient code since I am not that experienced with CUDA (and my C++ is pretty rusty).
Here is a link to my code: