GRU can't deal with self.hidden (AttributeError: 'tuple' object has no attribute 'size')

Hey guys!

I’m currently working on a classification using LSTM. For faster training, I wanted to try using an GRU instead. I don’t change anything else, only the training model itself. But I get this error:

File "/LSTM-Classification-Pytorch/utils/", line 33, in forward
    lstm_out, self.hidden = self.lstm(x, self.hidden)
  File "/python3.5/site-packages/torch/nn/modules/", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/python3.5/site-packages/torch/nn/modules/", line 190, in forward
    self.check_forward_args(input, hx, batch_sizes)
  File "/python3.5/site-packages/torch/nn/modules/", line 162, in check_forward_args
    check_hidden_size(hidden, expected_hidden_size)
  File "/python3.5/site-packages/torch/nn/modules/", line 153, in check_hidden_size
    if tuple(hx.size()) != expected_hidden_size:
AttributeError: 'tuple' object has no attribute 'size'

Below is my code:

class LSTM(nn.Module):

    def __init__(self, embedding_dim, hidden_dim, alphabet_size, num_layers, label_size, batch_size):
        super(LSTM, self).__init__()
        self.hidden_dim = hidden_dim
        self.batch_size = batch_size
        self.num_layers = num_layers
        self.drop_out = nn.Dropout()  # p = 0.5

        self.embeddings = nn.Embedding(alphabet_size, embedding_dim)
        # changed from nn.LSTM to nn.GRU
        self.lstm = nn.GRU(embedding_dim, hidden_dim, bidirectional=True)
        self.hidden2label = nn.Linear(hidden_dim * 2, label_size)  # hidden_dim * 2 for bidirectional
        self.hidden = self.init_hidden()

    def init_hidden(self):
        # self.num_layers * 2 for bidirectional
        h0 = Variable(torch.zeros(2, self.batch_size, self.hidden_dim))
        c0 = Variable(torch.zeros(2, self.batch_size, self.hidden_dim))
        return h0, c0

    def forward(self, text):
        embeds = self.embeddings(text)
        embeds = self.drop_out(embeds)
        x = embeds.view(len(text), self.batch_size, -1)
        lstm_out, self.hidden = self.lstm(x, self.hidden)
        sent_rep = torch.max(lstm_out, 0)
        y = self.hidden2label(sent_rep[0])

        return y

It seems like GRU can’t take the hidden layers, because they are given in a tuple. But when I use the LSTM, the hidden layers look the same and it can deal with it.
Is there something I’m missing?

Thank you for your help!

1 Like

init_hidden in your code returns h0 and c0. GRU takes only h0, hidden state. LSTM on the other hand takes h0 (hidden state) and c0 (cell state).

1 Like

Damn, I totally missed that. I should learn more about how a GRU works. Thank you!

@Lady_Hangaku can u please elaborate more on how did you fix this issue. I am facing the same problem.
Thank you!

@pyTorch_beginner, there is a chance that your problem is in training.
You can implement both forward methods in the same way:

def forward(self, x, h):
        out, h = self.lstm(x, h)
        out = self.fc(self.relu(out[:,-1]))
        return out, h

And afterwards, when calling method,
output, h = net(inputs, h)
Usually, in LSTM, duo to presence of cell state, we change hidden to something like this:
h = tuple([ for each in h])
but for GRU:
h =