I’m learning about RNNs from the Udacity PyTorch class, and I think I understand the basic concept. Now I’m trying to get myself familiar with the details, starting with understanding the dimensions of inputs and outputs of different layers in an RNN.
- input_size – The number of expected features in the input x.
- hidden_size – The number of features in the hidden state h.
- num_layers – Number of recurrent layers.
I copied these parameter descriptions from the docs, but I’m having a hard time visualizing my network based on these parameters.
- What do the docs mean by “features in the input x”? Does it refer to having multiple variables? Or like an image might have multiple channels?
- What are “features in the hidden state”? Again, the word “features” seems pretty ambiguous, and I don’t know what it means at all.
I get that the hidden state has some “memory” of its previous self and is updated by combining the current input to the hidden layer with the previous hidden state. I don’t get the specifics of what’s going on though. For example, here is a question from the class, which I have no idea how to answer:
Say you’ve defined a GRU layer with input_size = 100
, hidden_size = 20
, and num_layers=1
. What will the dimensions of the hidden state be if you’re passing in data, batch first, in batches of 3 sequences at a time?
Unfortunately, RNNs are not explained as well as CNNs were in the previous lesson. Thanks for your help!