Same model, same parameters, different result for same input

fdokic · February 19, 2020, 2:26pm

Hi all

I have two different instantiations of a model, gn and gx, both consisting of following operators:
I instantiated gx using exactly the same parameters as gn (reference).

what I did:

I passed the same input, operator for operator through both models and saved the intermediate results in lists (Sequential and custom class shouldn’t matter, right? hint that it doesn’t: other model works exactly using that setting)
the results differ from the 5th last layer (from convTrans2d)
the parameters of all operators are the same: checked using:

out_x = list(gx.parameters())
out_n = list(gn.parameters())
for i in range(len(out_x)):
    print(torch.all(out_x[i].eq(out_n[i])).item(), torch.allclose(out_x[i], out_n[i]))

they are all true (same paramters)

storing the intermediate results of each layer in out_x for gx and out_n for gn and comparing the differences I get following:

In [12]: [torch.sum(torch.abs(out_x[i] - out_n[i])).item() for i in range(len(out_x))]
Out[12]:
[0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.0,
 0.00011764606460928917,
 0.00026547431480139494,
 0.00026547431480139494,
 0.0001205947482958436,
 0.00011362379882484674]

Does anyone have a clue what the reason could be? I don’t unterstand why it changes suddenly in the end, if parameters would be the cause it could differ earlier?

The other model (similar size & operators) also has slightly different hyperparamters e.g. alphas of LeakyReLu but still manages to produce equal outputs…

Any ideas are highly appreciated!

ptrblck · February 20, 2020, 12:54am

Did you call model.eval() on both models and could you also check the .buffers()?

fdokic · February 20, 2020, 9:06am

Yes I did, the buffers are also mutually equal. There should not be other sources for non determinism for this model right? So what else could be the reason? Thanks!

ptrblck · February 21, 2020, 12:52am

Could you post the model definition by wrapping the code into three backticks ```?

Also, are you using e.g. torch.backends.cudnn.deterministic = True as described here?

fdokic · February 21, 2020, 2:20pm

This was it! Thank you Peter!