Optimizing custom RNN implementation

Hi @tom,

Fair enough that benchmark seems more useful.

What would you say is more efficient, implementing the LSTM outerloop in Python and calling the optimized function for each cell (as the LLTM example would imply), or implement the whole thing in C++ as a single function or an ATen module? I guess I’m trying to pinpoint exactly how it is done for the current implementation of LSTM.

Thanks for the help