nn.Linear without activation

hi, I’m trying to do a Denosing Auto Encoder, and have a question - is it a bad idea to have a model without activations?

i.e. what I do is basically is:
Linear(100,1000) -> Linear(1000,1000) -> Linear(1000,100)
I also tried with Relu, i.e.:
Linear -> Relu -> Linear -> Relu -> Linear

but the one without activations seems to work better on the Validation set (converges faster and to a lower MSE loss)

So I would like to understand if what I do (Linear without ReLU) is conceptually “ok” or “wrong”

A combination of linear functions is a linear function.
This means that without non-linear activations, you’re entire network can be replaced by one linear layer.

thanks, so other than that I guess it’s “ok”, network seems to perform good so I will probably keep it

will test as well with just one layer, though not sure how wide it should be … unless my math is wrong I have now around 1M (1000*1000) Parameters (can’t check right now), so to replace it with single layer in the middle I would need:
Linear(100, 1M) -> Linear(1M, 100) ?