There are many factors that can cause differences. Some people have reported things to try here.
Same problem here. Cannot replicate TF Adam optimizer success in Pytorch.
Edit: Disregard. I’m actually getting better loss in Pytorch over TF with Adam now that I’m actually taking the mean of my losses.
size_average=False found in jcjohnson’s github examples can make for a long night for a newbie.
I also have the same problem.
I implemented AE and VAE on both Keras(Tensorflow) and Pytorch.
Using Adadelta gave me different loss values and Pytorch did the worst thing on my network.
I spent 2 weeks to double check my codes untill I found this post.
Thank you guys that I am not the only one who experiences this issue.
Same problem here!
More specifically, it turns out that Pytorch training with Adam will stuck at a worse level (in terms of both loss and accuracy) than Tensorflow with exactly the same setting. I came across this issue in two process:
(1) standard training of a VGG-16 model with CIFAR-10 as dataset.
(2) generating CW L2 attack. See https://github.com/carlini/nn_robust_attacks/blob/master/l2_attack.py for details. I reproduce this attack method to test my model trained with Pytorch. The loss also stuck at a undesirable level for some images, and the adversarial counterparts couldn’t be generated.
Interestingly, I solved these issues by manually letting the learning rate decay to its half at scheduled step (e.g. lr = 0.5 * lr, every 20 epochs). After doing so, Pytorch could reach comparable results as Tensorflow (without decaying its learning rate), and everything works fine for me.
However, I think that actually Adam should adjust its learning rate automatically. So I still don’t know the true reason for this.