What's the difference between those two simple nn models?

The initialization does not depend on the outputs, you are right about it.
However, usually you initialize your layers using some “random” numbers.
These random numbers are created by the pseudo-random number generator.
In fact, we can seed the PRNG so that after seeding we get the same “random” numbers.

Now let’s create a small dummy model with just two layers, one conv layer and one linear layer.
By changing the order of these layers, we can create two models like in your example:

# model1:
- seed the PRNG
- init conv layer
- init linear layer

# model2:
- seed the PRNG
- init linear layer
- init conv layer

Both model will have the desired initialization, e.g. xavier_uniform. However, their parameters won’t have the exact same numbers.
The reason for this is, that the PRNG was called in a different order.

Have a look at this small example:

torch.manual_seed(0)
print(torch.empty(5).uniform_())
print(torch.empty(5).normal_())

# Same results
torch.manual_seed(0)
print(torch.empty(5).uniform_())
print(torch.empty(5).normal_())

# Different
torch.manual_seed(0)
print(torch.empty(5).normal_())
print(torch.empty(5).uniform_())
2 Likes