With Torch(1.13.x), I’ve been trying to implement some activation functions from scratch like mish or ELU, etc. for custom activation function.
However, I get nan value of loss after about 17 epochs when I train the model.
- dataset: official MNIST dataset from each framework
- model architecture: simple dense network(25 layers with 500 neurons each)
- lr: 1e-3 (I don’t want to fix this)
- batch_size: 128
- optimizer: Adam
class Mish_Implementaion(nn.Module): def __init__(self): super(Mish_Implementaion, self).__init__() self.__name__ = 'Mish' def forward(self, x): return t.where(x < -7, 0, t.where(x > 30, x, x * t.tanh(t.log(1 + t.exp(x)))))
and got this error message:
Function 'ExpBackward0' returned nan values in its 0th output.
I guess it’s because of exp. function getting overflow. But that’s why I used torch.where function to avoid exp() return too high of a value.
I want to add some trainable parameter here, so making this work would be important.
Any advice is really appreciated, Thanks in advance.