In my opinion, if the models are independently trained, it is not meaningful to combine the neural networks this way, as there are no correlation / common reference points between them. i.e., it’s not guaranteed that the 1st layer in 1st NN learns similar features in the same order as the 1st layer in 2nd NN.
It is technically possible to do it as you know and may possibly give weights better than random initialization.