Is pytorch adam eps same with keras adam eps?

Hi, I moved my keras model to pytorch, but the model’s accuracy seems not right. Keras model’s result always 1%-2% higher than pytorch.

I used same initializer (xaiver uniform for kernel and zeros for bias), same random seed, same hyper parameters, and same datasets with same order (no shuffle) , the only thing I thought would be the adam optimizer, I found keras’ document said:

This epsilon is “epsilon hat” in the Kingma and Ba paper (in the formula just before Section 2.1), not the epsilon in Algorithm 1 of the paper.

I searched it from forum but not find a solution about this.

Maybe there are differences between pytorch’ adam and keras’ s adam?

What should I do to let pytorch defeat keras? :sob:

Hi @bfss,

If I recall correctly, the eps factor in Adam varies between PyTorch and Keras. In Keras the eps is included within the sqrt of exp. moving average in the denominator, but in PyTorch it isn’t. So, that may have some effect on the asymptotic of your convergence.

Also, just because you use the same initializer doesn’t mean you have the same weights. Remember nn.Linear (the Dense layer in Keras) are defined differently from one other, albeit via a transpose operator, so you’ll get different weights and hence different performance.

You must make sure all aspects of the code is the same, because if they were you’d get the same results

@AlphaBetaGamma96

Very thanks for your reply~

I tried several (about 20 times)different random seed and export keras’ weights to pytorch, and keras always better than pytorch on my dataset.

Although the weights are same, keras’ accuracy is 97.75% while pytorch’s accuracy is 96.52%, I think the adam makes the difference between them.

Maybe I should try other optimizers.

Can you give me some advice about which optimizer is suitable for text classification?