Significance of relu for fc layers

How significant is adding relu to fully connected layers? Is it necessary or how the performance of a model is affected by adding relu to Fc layers?

Without activation functions your stacked linear layers can be seen as a single linear mapping and your model might thus fail easily.
This chapter in the DeepLearningBook explains more about activation functions and uses the XOR example.

1 Like