I am currently trying to optimize a simple NN with Optuna. Besides the Learning Rate, Batch Size etc. I want to optimize different network architecture as well. So up until now I optimize the number of LSTM layers, aswell as the number of Dense layers. But now I was thinking about activation functions. Bare in mind I am very new to NN… but I am constently reading about ReLu and Leaky ReLu and I know LSTM uses tanh and sigmoid internally. So first I thought maybe the internal tanh gets switched with a ReLu function but I think I got that wrong right?
What I have seen is that the nn.ReLu() gets applied in between Layers, so I would think it would only make sense to apply it in between my Dense Layers?
Sorry for the Noob Question. I am having trouble understanding these things as they are so basic that they are nowhere discussed.