Will Convolution layers without activation collapse into one layer?

In plain neural networks, if we don’t apply activations (e.g., ReLU) after the weights, the matrices of multiple layers will mathematically collapse into one matrix due to simple matrix multiplications. I wonder if that’s also true for convolutional layers? I.e.:

If we don’t use nonlinearities after the convolution and stack multiple layers, will they effectively degenerate to fewer layers too? Can we mathematically prove it either they will/will not?

Yes, because conv can be expressed as a masked matrix multiplication.

1 Like