There are many factors that can cause differences. Some people have reported things to try here.
Same problem here. Cannot replicate TF Adam optimizer success in Pytorch.
Edit: Disregard. I’m actually getting better loss in Pytorch over TF with Adam now that I’m actually taking the mean of my losses.
size_average=False found in jcjohnson’s github examples can make for a long night for a newbie.