RNN and Adam: slower convergence than Keras

It looks like you have different learning rates for Kiera’s model lr=0.01 and pytorch model lr=0.001 so most likely your main cause for differing convergence rates