Keras model learns, but PyTorch version doesn't

I have a pseudo-siamese network implemented in Keras and I train it on my dataset of images to learn an image matching task. I then reimplemented the network in PyTorch as I prefer it to Keras, and the network now no longer trains.

I am using the same dataset, same network initialisation scheme, and the same learning rate and weight_decay parameters with Adam. The dataset code for PyTorch is a wrapper of the Keras dataset code and I have double and triple checked that the networks are the same. The only difference is that in Keras I apply softmax before the loss, while PyTorch cross entropy loss includes the softmax function.

Has anyone else experienced this? Any ideas about what I could try? I have been stuck with this for hours and cannot think what else to check.

The implementations for PyTorch and Keras are here: https://gist.github.com/system123/a660ca6ed7cb02eca500fbb1f3de546e

Just skimming through your code I’ve seen that some conv layers use a stride of 2 and the padding option same.
Could this issue be related to the problem?
Apparently the “same” padding works differently for different backends in Keras.
If you can provide a small executable code snippet, I could help debugging the issue.

Seems this fixed it, it was indeed that the “same” padding was acting differently for PyTorch and Keras.

1 Like