High mse compared to keras models with same set up to train a custom RNN

HI,
I am very much attracted to pytorch and have started using it instead of keras with theano as back end recently.

however when i train the custom rnn model in pytorch I am getting mse approx 1e-5 which is way above compared to keras where i trained the same model and got mse approx 1e-10

Iam afraid the way bptt implemented in pytorch and the optimizers in pytorch to train rnn are not that efficient as theano

Please if any one can guide me in resolving this problem would be greatly appreciable