Differences in network prediction output for the same input

I was playing with an autoencoder. The decoder part is defined like this:

    self.decoder = torch.nn.Sequential(
        torch.nn.Linear(10, 125)
        , torch.nn.ReLU()
        , torch.nn.Linear(125, 250)
        , torch.nn.ReLU()
        , torch.nn.Linear(250, 500)
        , torch.nn.ReLU()
        , torch.nn.Linear(500, 1000)
        , torch.nn.ReLU()
        , torch.nn.Linear(1000, 28 * 28)

After training the network without problems, if I do:

myInput1 = torch.autograd.Variable( torch.Tensor([1,0,0,0,0,0,0,0,0,0]) )
myOutput1 = autoencoder.decoder(myInput1)
print(myOutput1.data - myOutput1.data)

I, obviously get all ceros.
But if I predict the exact same input again in the network and substract the two outputs:

myOutput2 = autoencoder.decoder(myInput1)
print(myOutput1.data - myOutput2.data)

I get small numbers but not ceros.
My question is, why there is a different result?, why is it not deterministic if the network it’s not changing between executions?

This is all in the same machine without training the network again.

Obviously, I’m missing something here. Thanks in advance.

How small are the differences?
Is it floating point precision-level differences?

Yes, really small, (but the input is exactly the same) :

1.00000e-08 *

If you use multiprocessing (used by default if you run on cpu) or multigpu or some non-deterministic gpu operations, the accumulation between different workers can happen in a different order. Remember that for floats, (a+b)+c != a+(b+c) and so this kind of small errors can appear.

1 Like