I have tried PyTorch (PT) and Tensorflow (TF) to implement the same idea of an unsupervised learning task. The network structures are the same. The training data keep the same distribution. And I use the same loss function and other hyper-parameters. The strange thing is that under the same number of training batches, the testing performance obtained by TF is much better than that of the PT. The testing parameters also keep the same. I have tried training more times in PT but still got poor performance. Have anyone observed such a phenomenon? I have checked the codes in PT and TF again and again, and there is nothing wrong with the codes. I really can’t understand why this happened.
Could you please share the minimal executable code?
Also, could you try profiling the code and share the outputs here? You can refer to this link for PyTorch. It’s a great tutorial to profile models. I’m not sure about TensorFlow, please check the similar approach in their documentation.