In RNN, GRU and LSTM, I seem to only have control over the hidden state size, which will also dictate output size. Isn’t this a limitation? It seems to me that with this setup, the complexity of the NN is dictated by output size. My question is: is there any way to make things more flexible than those args:
You can use a linear layer on the output of the RNN to get whatever output dimension you want. That’s exactly what the implementation of an LSTM or GRU cell would have done if it would have provided an output_size
argument.
1 Like