If we add a conv layer with 64 filters how are they actually optimized differently and do not end up being all the same value?
Since all filters are initialized randomly (with different random weights) the training let’s them converge to different final values. If you would initialize all with the same static value(s), the training might collapse and the final values could be the same.
1 Like