Hi, I moved my keras model to pytorch, but the model’s accuracy seems not right. Keras model’s result always 1%-2% higher than pytorch.

I used same initializer (xaiver uniform for kernel and zeros for bias), same random seed, same hyper parameters, and same datasets with same order (no shuffle) , the only thing I thought would be the adam optimizer, I found keras’ document said:

This epsilon is “epsilon hat” in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.

I searched it from forum but not find a solution about this.

Maybe there are differences between pytorch’ adam and keras’ s adam?

If I recall correctly, the eps factor in Adam varies between PyTorch and Keras. In Keras the eps is included within the sqrt of exp. moving average in the denominator, but in PyTorch it isn’t. So, that may have some effect on the asymptotic of your convergence.

Also, just because you use the same initializer doesn’t mean you have the same weights. Remember nn.Linear (the Dense layer in Keras) are defined differently from one other, albeit via a transpose operator, so you’ll get different weights and hence different performance.

You must make sure all aspects of the code is the same, because if they were you’d get the same results