I’m not sure how to interpret your “model” definition:
RELU (w1x + w2x)
as it seems you are trying to sum the outputs of both layers, which is unusual.
Assuming you want to use two consecutive linear layers with a final ReLU, you model would be defined as:
out = relu(w2 @ (w1 @ x))
In this case you are right and the two linear layers can be seen as a single one, since no activation function was used.
However, collapsing these layers is not possible if you are using activation functions, as would be the common approach: