Hidden layer depth in GRU

Hi all, I have a question regarding the number of hidden layers with a GRU module. For example in the below code I use the gru num_layers parameter to increase the amount of ‘hidden layers’ I can have in my init_hidden, however from what I’ve read this parameter is actually talking about the stacking of two GRUs. To me this sounds as if I set num_layers=2 it doesn’t change the depth of hidden layers but instead creates an encoder-decoder type architecture where the final hidden state representation of the first GRU is then passed to the stacked second layer for final decoding. Can someone clarify which is correct and if it is the latter, then can I also get tips on how to increase hidden layer depth?

class my_gru(nn.Module):

    def __init__(self, input_size, hidden_size, num_layers=2):
        super(my_gru, self).__init__()

        self.gru = nn.GRU(input_size, hidden_size, num_layers=num_layers, batch_first=True)
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.nlayers = num_layers
        self.linear = nn.Linear(self.hidden_size,1)

    def forward(self, inputs, hidden):
        if hidden is None:
            hidden = self.init_hidden()

        out, self.hidden = self.gru(inputs,hidden)
        decoded = self.linear(out)
        return decoded, self.hidden
    def init_hidden(self):
        weight = next(self.parameters())
        return weight.new_zeros(self.nlayers, 1, self.hidden_size)

I have the same question about nn.LSTM. waiting for the answers as well.