I noticed that there is a big speed gap between cudnnLSTM and LSTMCell + Loop.
If I do the lstmcell+loop, the gpu utility is much slower than cudnnLSTM.
Can any one tell me the underneath reason? Thank you very much!
I noticed that there is a big speed gap between cudnnLSTM and LSTMCell + Loop.
If I do the lstmcell+loop, the gpu utility is much slower than cudnnLSTM.
Can any one tell me the underneath reason? Thank you very much!
NVIDIA have a blog post about how they optimise for RNNs in cuDNN: Optimizing Recurrent Neural Networks in cuDNN 5