RNN and Adam: slower convergence than Keras

stefanonardo · January 6, 2018, 2:43am

Now I’m getting very bad results with truncated back-propagation. Could anyone check if there are any bugs in my code please? I followed this tutorial to implement the truncated backprop. As I did before I used the same hyper-parameters and initializations both for PyTorch and Keras.
Here are the results after just 100 epochs of training and the links to minimal code:

PyTorch [code]
TR: 0.000440476183096 VL: 0.00169517311316
TR: 0.000462784681366 VL: 0.00128701637499
TR: 0.000823373540768 VL: 0.00211899834873
TR: 0.000430527156073 VL: 0.00167960980949
TR: 0.000533050970649 VL: 0.000932638757326

If you set TIMESTEPS to NaN it will apply the backprop through the entire sequence and you can see that it works good this way. The issue appears only when I truncate the sequence.

Keras [code]
TR: 1.60957323398e-05 VL: 3.12658933101e-06
TR: 1.97489706594e-05 VL: 3.44138302082e-06
TR: 2.47815147053e-05 VL: 5.84050205497e-06
TR: 2.54522322033e-05 VL: 2.236503277e-05
TR: 1.96936488671e-05 VL: 6.55356349568e-06