Will Convolution layers without activation collapse into one layer?

In plain neural networks, if we don’t apply activations (e.g., ReLU) after the weights, the matrices of multiple layers will mathematically collapse into one matrix due to simple matrix multiplications. I wonder if that’s also true for convolutional layers? I.e.:

If we don’t use nonlinearities after the convolution and stack multiple layers, will they effectively degenerate to fewer layers too? Can we mathematically prove it either they will/will not?

Yes, because conv can be expressed as a masked matrix multiplication.

1 Like

Suppose that you stack two 3x3 convolutions. Now to compute the value of every pixel you have to look at the neighbors of the neighbors too. Effectively, the kernel size grows to 5x5. A 5x5 convolution requires 25 multiplies per pixel so it is not necessarily faster than two 3x3 convolutions which only require 18 multiplies.