Suboptimal convergence when compared with TensorFlow model

In my experiment, however, I followed these to and ended up with similar results:

  • Used nn.init.xavier_uniform_ for weights and nn.constant_ for the biases.
  • In the adam optimizer, PyTorch uses default eps=1e-8 vs TensorFlow’s epsilon=1e-7.Changed it to 1e-7

Hope this helps