Your code looks generally good!
Could you try to apply the same weight initializations that are used in Keras to compare the models?
Here is a small example.
Also, could you post the Keras code, as there still might be some small differences?
Some minor issue:
-
Variablesare deprecated and you can usetensorsdirectly since PyTorch0.4.0 - It’s generally recommended to call the model directly instead of
forward. You could changeself.forward(x)toself(x).