As far as I understand, nn.relu() is a layer that has weights and bias whereas F.relu is just a activation function. Doesn’t that make nn.relu() a bit more computationally heavy than F.relu because optimizer has to update the redundant weights and bias for that layer too?
nn.ReLU() is a layer, but it has not weights or bias.
The two are exactly the same.
The version as a nn.Module is convenient to be able to add it directly into a
nn.Sequential() construct for example.
The functional version is useful when you write a custom forward and you just want to apply a relu.
Thank you for your answer.
Be warned, Google shows " relu() is a layer that has weights and bias whereas F. relu is just a activation function." instead of @Huy_Ngo s answer to my research “nn.ReLU or F.relu ?” so if someone does not click on the page he/her will see a bad answer