Why does `nn` has `ReLU`, `Tanh` and 'Sigmoid` non-linear layers but not `Atan`?

I mean these are all non-linear transformations and they can all be handily accessed with Tensor.xxx (for ReLU it’s Tensor.clip(min=0.0) so only a minor difference). It looks to me they share many similarities. Hence the question: are there any special reasons that atan only exists in utility and tensor member functions but not an nn.Module like nn.Tanh ?

I don’t think there is any special reason but atan is not that famous…

Is there any advantages? Converge to pi / 2 is not effective in training

I think atan is “slightly” better than tanh in some cases since it’s derivative decays slower (1/x^2 decay v.s. exp decay) so it suffers less from the dead neuron problem at least on paper. In practice it all depends on the specific problems of course. I don’t think there’s issue in converging to pi/2 or 1 or some other constant as it can be easily rescaled.

Oh, I see.
But PyTorch doesn’t support atan yet.

PyTorch does support atan.

You can easily define a custom module via:

class Atan(nn.Module):
    def __init__(self):

    def forward(self, x):
        return torch.atan(x)

model = nn.Sequential(
    nn.Linear(10, 10),
    nn.Linear(10, 2)

x = torch.randn(1, 10)
out = model(x)
print([(name, p.grad.abs().sum()) for name, p in model.named_parameters()])

If you think this is a widely used non-linearity and deserves an own implementation, feel free to create a feature request on GitHub.

Thanks to your reply.
What I meant is there is no nn.Atan() yet :slight_smile:

1 Like

Thanks. That I understand, and it’s exactly what I’m doing now in my code. This question is purely out of curiosity on why it’s not originally supported in the nn library as the other common non-linear activators. I do agree that atan is somewhat less common than tanh in the context of activation functions. Maybe that’s just the reason.

Yes, I think you are right and it might not be common enough (so far).
If you create the feature request, make sure to add some papers or popular repositories using atan as an activation function.
We had a similar use case for nn.Identity which is trivial to implement, but was commonly used so that a built-in module made sense.