Suboptimal convergence when compared with TensorFlow model

Thanks @tom, actually I used the same initial weights (not only the same initialization method) to train it, so it really seems to be something fundamentally wrong with the Adam optimizer itself (I also checked the loss multiple times). I think that I’ll wait for Pytorch to stabilize because I don’t have so much time to invest in debugging it, unfortunately. Dissecting every aspect of the model takes a lot of time. Good luck with your model by the way !!