How to choose the size of hidden size value?

According to some posts I read, the hidden_size parameter (GRU) affect underfitting or overfitting.

I read about some rules of thumb (choosing hidden_size):

  1. value between the input layer size … and the output layer size.
    or
  2. (Number of inputs + outputs) * (2/3)

So if I have:

  • input sequence with the the shape (10, 2) (i.e sequence of length 10 with 2 features)

And

  • output is (5,1) (i.e sequence of length 5 with 1 feature)

Do I need to set the hidden_size with the value of 10 ? (10+5)*2/3 ?

  • Am I right ?

I saw a lot of examples (from kaggle) of models using GRU with hidden_size between 32-256.

According to my example above, what is the right (make sense) value of hidden_size ?