Adding More Linear Layers in the RNN

Dear Friends

I’m following the tutorial

In the Exercise section, you will see:
-> Add more linear layers

Does adding more linear layer means, creating hidden-to-hidden layers?

For instance, below is the tutorial code:

class RNN(Module):
    def __init__(self, input_size, hidden_size,
                 output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        size_sum = input_size + hidden_size
        self.i2h = Linear(size_sum, hidden_size)
        self.h2o = Linear(size_sum, output_size)
        self.softmax = LogSoftmax(dim=1)

    def forward(self, input_, hidden_):
        combined = cat(tensors=(input_, hidden_), dim=1)
        hidden_ = self.i2h(input=combined)
        output = self.h2o(input=combined)
        output = self.softmax(input=output)
        return output, hidden_

    def init_hidden(self):
        return zeros(1, self.hidden_size)

If I add a hidden-to-hidden layer, does it mean I’m adding more linear layers or am I mistaken?

class RNN(Module):
   def __init__(self, input_size, hidden_size,
                output_size):
       super(RNN, self).__init__()
       self.hidden_size = hidden_size
       size_sum = input_size + hidden_size
       self.i2h = Linear(size_sum, hidden_size)
       self.h2h = Linear(hidden_size, hidden_size)
       self.h2o = Linear(hidden_size, output_size)
       self.softmax = LogSoftmax(dim=1)

   def forward(self, input_, hidden_):
       combined = cat(tensors=(input_, hidden_), dim=1)
       hidden_ = self.i2h(input=combined)
       hidden_ = self.h2h(input=hidden_)
       hidden_ = self.h2h(input=hidden_)
       hidden_ = self.h2h(input=hidden_)
       hidden_ = self.h2h(input=hidden_)
       hidden_ = self.h2h(input=hidden_)
       output = self.h2o(input=hidden_)
       output = self.softmax(input=output)
       return output, hidden_

   def init_hidden(self):
       return zeros(1, self.hidden_size)

The reason I’m asking is when I run the second RNN Module my accuracy is 8% lower than the previous module. I may be interpreting adding a more linear layer statement wrong.

Thanks

You call hidden_ = self.h2h(input=hidden_) multiple times in your forward() method. I don’t think that make sense. Try doing it only once.

But yes, “adding more linear layers” is rather ambiguous. A arguably harmless addition would be to add a linear layer between self.h2o and self.softmax

Also note that you need an non-linear activation function between linear layers (e.g., ReLU). I mean it works without it, of course, but you essentially give up the benefits of additional layers.

1 Like

Dear Chris

From what I gather, I need to code between the output -> softmax layers

class RNN(Module):
    def __init__(self, input_size, hidden_size,
                 output_size):
        super(RNN, self).__init__()
        self.hidden_size = hidden_size
        size_sum = input_size + hidden_size
        self.i2h = Linear(size_sum, hidden_size)
        self.h2o = Linear(hidden_size, output_size)
        self.o2h = Linear(output_size, hidden_size)
        self.softmax = LogSoftmax(dim=1)

    def forward(self, input_, hidden_):
        combined = cat(tensors=(input_, hidden_), dim=1)
        hidden_ = self.i2h(input=combined)
        output = self.h2o(input=hidden_)
        hidden_ = self.o2h(input=output)
        hidden_ = relu(input=hidden_)
        output = self.h2o(input=hidden_)
        return output, hidden_

    def init_hidden(self):
        return zeros(1, self.hidden_size)

When I made the network like above, it is not learning anything. (0% accuracy)

I think, I need more time to think about where I’m doing it wrong.

Thanks for the answer.

Everything works when you use the code given in the tutorial?

I cannot see any main issue why adding a layer would fail. Well you seem to miss

output = self.softmax(output)

in your forward() method.

In practice, an accuracy of 0% can’t be right. The tutorial shows a classification task, so just by chance some predictions should be correct. Without any training, the accuracy of a classification model us usually around 1/#num_classes which more or less represents guessing.