According to some posts I read, the hidden_size parameter (GRU) affect underfitting or overfitting.
I read about some rules of thumb (choosing hidden_size):
- value between the input layer size … and the output layer size.
or - (Number of inputs + outputs) * (2/3)
So if I have:
- input sequence with the the shape (10, 2) (i.e sequence of length 10 with 2 features)
And
- output is (5,1) (i.e sequence of length 5 with 1 feature)
Do I need to set the hidden_size with the value of 10 ? (10+5)*2/3 ?
- Am I right ?
I saw a lot of examples (from kaggle) of models using GRU with hidden_size between 32-256.
According to my example above, what is the right (make sense) value of hidden_size ?