Worse performance than tensorflow

Recently, I implemented a seq2seq model(original ver was tf).The model has two decode mode:
mode 1, every step decode by the same decoder, s comes from encoder.
mode 2 use different decode mode, the decoder cell varied by timestep, as follows, s is same as mode1.

Ok, I implemented mode 1 use torch, and reached the same performance as tf’s.But I got worse performance on mode 2, while tf is better than mine about 3%. The tf ver was released by paper writer, he said mode 2 was about 3% better than mode 1. But I got almost the same results on two mode. I made some errs? (I use the same hyper params, optimizer adam)
The tf ver link: