Why is ReLU a nn.Module but scaled_dot_product is not

Hi, I was wondering what the difference between scaled_dot_product and ReLU is that makes one callable as a module while the other is only available in the functional sub-module.
Is it that even though ReLU is stateless it only gets one input, facilitating using it in sequential models? Or that the scaled_dot_product is almost always used in the MultiheadAttention module, removing the need of wrapping it inside a module.
Thank you :slight_smile:

IMO there’s no strong reason ReLU needs to be a module either, though as you mention, having more things as modules means that you are able to act on things in a more uniform fashion.

1 Like