How significant is adding relu to fully connected layers? Is it necessary or how the performance of a model is affected by adding relu to Fc layers?
Without activation functions your stacked linear layers can be seen as a single linear mapping and your model might thus fail easily.
This chapter in the DeepLearningBook explains more about activation functions and uses the XOR example.
1 Like