Suboptimal convergence when compared with TensorFlow model

(Kai Arulkumaran) #21

There are many factors that can cause differences. Some people have reported things to try here.

(Jeff) #22

Same problem here. Cannot replicate TF Adam optimizer success in Pytorch.

Edit: Disregard. I’m actually getting better loss in Pytorch over TF with Adam now that I’m actually taking the mean of my losses.
size_average=False found in jcjohnson’s github examples can make for a long night for a newbie.

(baboonga) #23

I also have the same problem.
I implemented AE and VAE on both Keras(Tensorflow) and Pytorch.
Using Adadelta gave me different loss values and Pytorch did the worst thing on my network.
I spent 2 weeks to double check my codes untill I found this post.
Thank you guys that I am not the only one who experiences this issue.


Same problem here!

More specifically, it turns out that Pytorch training with Adam will stuck at a worse level (in terms of both loss and accuracy) than Tensorflow with exactly the same setting. I came across this issue in two process:

(1) standard training of a VGG-16 model with CIFAR-10 as dataset.
(2) generating CW L2 attack. See for details. I reproduce this attack method to test my model trained with Pytorch. The loss also stuck at a undesirable level for some images, and the adversarial counterparts couldn’t be generated.

Interestingly, I solved these issues by manually letting the learning rate decay to its half at scheduled step (e.g. lr = 0.5 * lr, every 20 epochs). After doing so, Pytorch could reach comparable results as Tensorflow (without decaying its learning rate), and everything works fine for me.

However, I think that actually Adam should adjust its learning rate automatically. So I still don’t know the true reason for this.