In plain neural networks, if we don’t apply activations (e.g., ReLU) after the weights, the matrices of multiple layers will mathematically collapse into one matrix due to simple matrix multiplications. I wonder if that’s also true for convolutional layers? I.e.:
If we don’t use nonlinearities after the convolution and stack multiple layers, will they effectively degenerate to fewer layers too? Can we mathematically prove it either they will/will not?