I ported a simple model (using dilated convolutions) from TensorFlow (written in Keras) to pytorch (last stable version) and the convergence is very different on pytorch, leading to results that are good but not even close of the results I got with TensorFlow. So I wonder if there are differences on optimizers in Pytorch, what I already checked is:

- Same parameters for optimizer (Adam)
- Same loss function
- Same initialization
- Same learning rate
- Same architecture
- Same amount of parameters
- Same data augmentation / batch size

So I wonder, what else should I check ? It seems that everything was covered already. Any ideas ?

Example of convergence of the loss (unfortunately the two series are the same color, but pytorch is the one below), for this loss, higher is better:

Thank you !