I am using a simple LSTM network feeding its final output into a single layer that performs binary classification. I get stable results on any given machine. When training locally on my macbook versus on a GCE instance for hundreds of training samples the results appear identical. However, after I have trained on about 800 samples the outputs are similar noticeably different.
I realize that there are a number of factors that will affect reproducibility, including platform. However, from what I have seen it may be only the case that different results are expected between CPU and GPU. I am seeding the torch RNG and it appears to generate the same numbers.
Is it divergence to be attributed to an accumulation of small floating point operation inaccuracies which are slightly different on each architecture?
Thanks for the information!
The error after 200 samples is approx 1e-7, while float32 precision is ~1e-6, so that could be the reason.
Just for the sake of debugging, you could use float64 and see, if the differences occur a bit later in the training.