This is a general doubt I have about the relationship between the activation function and the data.
Imagine our data (images), after normalization, is centered at 0 and take values between -1 and 1. It means our network will try to output images which values will be also between -1 and 1.
So, for example, if we use ReLU as activation function after each layer, we are removing negative values during training, so the network will never be able to correctly predict our targets.
Is that right?
If I use ReLU, should I normalize my data between 0 and 1 insted of between -1 and 1?
PD: I ask this because I saw cases where they normalize between -1 and 1 and they use ReLU successfully.
Thanks in advance