Hi. So far, I’ve computed all activations by initialisation of modular layers.
If I replace it with an in place functional layer for the activation (say nn.functional.relu_), will backpropagation still work effectively?
If it works, isn’t it more advantageous to compute activations this way, without initialization of separate activation layers.
If not, what is the exact difference between the modular and functional layers?