RNN and Adam: slower convergence than Keras

@stefanonardo i ran your code with CPU mode, what exactly should I look for in terms of “non-convergence”?
The training loss seems to have gone down pretty well.

Here’s the code I ran: https://gist.github.com/soumith/ceb0d3de23585e676fd3b5e0402a45a3

Here’s the output log: https://gist.github.com/soumith/bfeeb1f5378030693b05231344c1c3f5