What's the difference between those two simple nn models?

Yes, both models are identical.
Even though the exact parameter values might be different, you should get approx. the same training result using these models.

After seeding the PRNG you’ll get the same sequence of random numbers.
The layer which gets initialized first will get the first “random” numbers for its initialization, while the second layer will get the subsequent ones.
Now if you change the layer orders (like in my example changing the function call order), the random number assignment will also change. This is a pretty simple example, as your layers might also have different numbers of parameters.

1 Like