In my experiment, however, I followed these to and ended up with similar results:
- Used
nn.init.xavier_uniform_
for weights andnn.constant_
for the biases. - In the adam optimizer, PyTorch uses default
eps=1e-8
vs TensorFlow’sepsilon=1e-7
.Changed it to 1e-7
Hope this helps