Test user written GRU with GRUCell in Pytorch

I’m trying to write my own GRU code (in Python). The code will implement a modified GRU cell with additional part that is controlled by a parameter (N). If N=0, in structure/math, it should be the same as GRUCell in Pytorch. I’m trying to test whether this is true by setting the same random seed


As is GRUCell is itself implemented in C++ but my new code is implemented in Python. Should I have the expectation that under same random seed and N=0, the whole training result will be the same. Sorry I cannot share the implementation code for the current.

Thank you.



No you cannot expect the training to give the exact same result. This is because floating point operations are not associative. So different implementations can lead to very small difference. If you then do gradient descent, it has the tendency to increase these differences very quickly leading to different results.

One way to test this is to set the same weights for both modules and make sure that the difference for a single forward/backward is <1e-5.

Thank you very much. Will test on that.