A backprop doubt for activations

Hi. So far, I’ve computed all activations by initialisation of modular layers.

If I replace it with an in place functional layer for the activation (say nn.functional.relu_), will backpropagation still work effectively?

If it works, isn’t it more advantageous to compute activations this way, without initialization of separate activation layers.

If not, what is the exact difference between the modular and functional layers?

  1. backpropagation should still work effectively.

  2. The functional / modular layers have different philosophies. Personally, yes, I find the functional method to be cleaner.

  3. It’s a matter of personal preference.

Happy hacking!