Will Convolution layers without activation collapse into one layer?

CDhere · November 26, 2020, 9:06pm

In plain neural networks, if we don’t apply activations (e.g., ReLU) after the weights, the matrices of multiple layers will mathematically collapse into one matrix due to simple matrix multiplications. I wonder if that’s also true for convolutional layers? I.e.:

If we don’t use nonlinearities after the convolution and stack multiple layers, will they effectively degenerate to fewer layers too? Can we mathematically prove it either they will/will not?

googlebot · November 26, 2020, 10:33pm

Yes, because conv can be expressed as a masked matrix multiplication.

Bjorn_Lindqvist · February 10, 2025, 10:01pm

Suppose that you stack two 3x3 convolutions. Now to compute the value of every pixel you have to look at the neighbors of the neighbors too. Effectively, the kernel size grows to 5x5. A 5x5 convolution requires 25 multiplies per pixel so it is not necessarily faster than two 3x3 convolutions which only require 18 multiplies.