Bidirectional GRU/LSTM error

I was working on converting my model to bidirectional (both the ones using LSTM and GRU), I thought the way to do that is simply make the bidirectional parameter True but unfortunately it did not work and it raise this error

RuntimeError: Expected hidden size (2, 10, 100), got (1, 10, 100)

Then I tried to follow the model below

class BiRNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(BiRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True, bidirectional=True)
        self.fc = nn.Linear(hidden_size*2, num_classes)  # 2 for bidirection
    def forward(self, x):
        # Set initial states
        h0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device) # 2 for bidirection 
        c0 = torch.zeros(self.num_layers*2, x.size(0), self.hidden_size).to(device)
        # Forward propagate LSTM
        out, _ = self.lstm(x, (h0, c0))  # out: tensor of shape (batch_size, seq_length, hidden_size*2)
        # Decode the hidden state of the last time step
        out = self.fc(out[:, -1, :])
        return out

model = BiRNN(input_size, hidden_size, num_layers, num_classes).to(device)


However it did not work as well,
it raises this error
RuntimeError: Expected hidden[0] size (2, 39, 50), got (2, 10, 50)Preformatted text

if I set the batch)first to False then this is the error
ValueError: Target and input must have the same number of elements. target nelement (20) != input nelement (78)
Thanks in advance :slight_smile:

If you are running the code directly from that page, you may have a cut and paste error. I just cut and pasted it directly and ran it, and it worked.

I can help with your first error:

RuntimeError: Expected hidden size (2, 10, 100), got (1, 10, 100)

It is a little misleading to say that all you need to do is set the bi-directional flag to be True. That is all that is necessary for the nn.LSTM(…) object to set itself up correctly, but it is not all that needs to happen to get your code to run correctly.

Specifically, a bi-directional RNN is very much like a pair of 1-directional RNNs running in parallel with their outputs jammed together at the end. Each of those RNNs need their own hidden and cell states. In PyTorch, the way to do that is to change the shape of the tensors holding the hidden and cell states. This is what the error message was telling you:

RuntimeError: Expected hidden size (2, 10, 100), got (1, 10, 100)

Note the change from 1 to 2. Somewhere in your original code, you were defining the initial hidden and cell states. Find those definitions and change them as the error message suggests.

Depending on what your code looked like, there may be other changes necessary.

1 Like

Thanks for your reply.
Actually I am following the logic applied in that code but my code is different.
This the ways how I define the hidden layer

    def initHidden(self, N):
        return (
        Variable(torch.randn(2, N, self.hidden_size)), Variable(torch.randn(2, N, self.hidden_size)))

where N is the batch size =10 and the hidden size is 50
the shape of the input tensor is [39, 10, 951] (sq,batch,feature)
so I really don’t know from where its expecting (2, 39, 50) for the hidden!

8I am trying to build an LSTM model for time series data. The details of dataset :
Input data is time series with 800 subjects, each having a 2D array data of 60 rows and 200 columns. I loaded the entire data as a Tensor of shape [800,60,200] and labels for the classification problem is of shape [800,1].
I made a dictionary of the data using the following code :

class DataCurate(Dataset):
def init(self, l1,l2, transform=None):
def len(self):
return len(self.l1)
def getitem(self, index):
sample = {‘time_data’: array, ‘labels’: label}
return sample

the data and labels are in variables x & y. I call data=Datacurate(x,y)

Later on I build and LSTM model for classification problem using the code:

class RNNModel(nn.Module):
def init(self, input_size, hidden_size, num_layers, num_classes):
super(RNNModel, self).init()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True) #inputsize=200,hid_size=320,n_layer=2
self.linear = nn.Linear(hidden_size, num_layers, bias=True)
self.softmax = nn.LogSoftmax()

def forward(self, x)
    out_packed, state = self.lstm(x)  # RNN
    print("lstm output size: {out.size()}"+str(out_packed.size()))
    out = self.linear(out_packed[-1])  # linear transform
print("linear output size {out.size()} "+str(out.size()))
log_probs = F.log_softmax(out,dim=1)
print("softmax output size {log_probs.size()}"+str(log_probs.size()))
    return log_probs

This gives me an error when I run the training script:
UserWarning: RNN module weights are not part of single contiguous chunk of memory. This means they need to be compacted at every call, possibly greatly increasing memory usage. To compact weights again call flatten_parameters().
out_packed, state = self.lstm(x) # RNN
Traceback (most recent call last):
File “”, line 100, in
output = model(train_inputs.transpose(0,1))
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/”, line 477, in call
result = self.forward(*input, **kwargs)
File “/media/iab/disk_a/meghal/test/quickdraw_tutorial_dataset_v1/pytorch_RNN_examples/”, line 26, in forward
out_packed, state = self.lstm(x) # RNN
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/”, line 477, in call
result = self.forward(*input, **kwargs)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/modules/”, line 192, in forward
output, hidden = func(input, self.all_weights, hx, batch_sizes)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/”, line 324, in forward
return func(input, *fargs, **fkwargs)
File “/usr/local/lib/python3.5/dist-packages/torch/nn/_functions/”, line 288, in forward
RuntimeError: param_from.type() == param_to.type() ASSERT FAILED at /pytorch/aten/src/ATen/native/cudnn/RNN.cpp:491, please report a bug to PyTorch. parameter types mismatch

I don’t know what this means and how do I resolve it. I am completely new to LSTM.