RNN for sequence prediction

Hello,

Previously I used keras for CNN and so I am a newbie on both PyTorch and RNN. In keras you can write a script for an RNN for sequence prediction like,

in_out_neurons = 1
hidden_neurons = 300

model = Sequential()  
model.add(LSTM(hidden_neurons, batch_input_shape=(None, length_of_sequences, in_out_neurons), return_sequences=False))  
model.add(Dense(in_out_neurons))  
model.add(Activation("linear"))  

but when it comes to PyTorch I don’t know how to implement it. I directly translate the code above into below, but it doesn’t work.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.rnn1 = nn.GRU(input_size=seq_len,
                            hidden_size=128,
                            num_layers=1)
        self.dense1 = nn.Linear(128, 1)

    def forward(self, x, hidden):
        x, hidden = self.rnn1(x, hidden)
        x = self.dense1(x)
        return x, hidden

    def init_hidden(self, batch_size):
        weight = next(self.parameters()).data
        return Variable(weight.new(128, batch_size, 1).zero_())

how can I implement something like the keras code? thank you.

The input_size argument to any RNN says how many features will there be for each step in a sequence, not what it’s length is going to be. Keras uses static graphs, so it needs to know the length of the sequence upfront, PyTorch has dynamic autodifferentiation so it doesn’t care about the sequence length - you can use a different one every iteration.

See the GRU docs for more details on the arguments.

Apart from this, your module looks good to me!

4 Likes

Thank you for your quick response, but the word features in a context of RNN is still unclear to me. The GRU doc says,

input : A (seq_len x batch x input_size) tensor containing the features of the input sequence.

and

input_size – The number of expected features in the input x

For example, if you input a sequence

[[[ 0.1,  0.2]],
 [[ 0.1,  0.2]],
 [[ 0.3,  0.1]]]

, then seq_len is 3, batch is 1 and input_size i.e. features is 2?

1 Like

Correct. features or input_size says how many dimensions each data point has.

3 Likes

Thanks a lot for your help, finally the code below works,

import torch
import torch.nn as nn
from torch.autograd import Variable

features = 1
seq_len = 10
hidden_size = 128
batch_size = 32

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.rnn1 = nn.GRU(input_size=features,
                            hidden_size=hidden_size,
                            num_layers=1)
        self.dense1 = nn.Linear(hidden_size, 1)

    def forward(self, x, hidden):
        x, hidden = self.rnn1(x, hidden)
        x = x.select(1, seq_len-1).contiguous()
        x = x.view(-1, hidden_size)
        x = self.dense1(x)
        return x, hidden

    def init_hidden(self):
        weight = next(self.parameters()).data
        return Variable(weight.new(1, batch_size, hidden_size).zero_())

model = Net()
model.cuda()
hidden = model.init_hidden()

X_train_1 = X_train[0:batch_size].reshape(seq_len,batch_size,features)
y_train_1 = y_train[0:batch_size]
model.zero_grad()
T = torch.Tensor
X_train_1, y_train_1 = T(X_train_1), T(y_train_1)
X_train_1, y_train_1 = Variable(X_train_1).cuda(), Variable(y_train_1).cuda()

output, hidden = model(X_train_1, Variable(hidden.data))
2 Likes

Thanks for your helping, like I wrote above the script works, “literally” but the loss doesn’t decrease over the epochs, so give me some advice. I think the related parts are,

class Net(nn.Module):
    def __init__(self, features, cls_size):
        super(Net, self).__init__()
        self.rnn1 = nn.GRU(input_size=features,
                            hidden_size=hidden_size,
                            num_layers=1)
        self.dense1 = nn.Linear(hidden_size, cls_size)

    def forward(self, x, hidden):
        x, hidden = self.rnn1(x, hidden)
        x = x.select(0, maxlen-1).contiguous()
        x = x.view(-1, hidden_size)
        x = F.softmax(self.dense1(x))
        return x, hidden

    def init_hidden(self, batch_size=batch_size):
        weight = next(self.parameters()).data
        return Variable(weight.new(1, batch_size, hidden_size).zero_())

def var(x):
    x = Variable(x)
    if cuda:
        return x.cuda()
    else:
        return x

model = Net(features=features, cls_size=len(chars))
if cuda:
    model.cuda()
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=lr)

def train():
    model.train()
    hidden = model.init_hidden()
    for epoch in range(len(sentences) // batch_size):
        X_batch = var(torch.FloatTensor(X[:, epoch*batch_size: (epoch+1)*batch_size, :]))
        y_batch = var(torch.LongTensor(y[epoch*batch_size: (epoch+1)*batch_size]))
        model.zero_grad()
        output, hidden = model(X_batch, var(hidden.data))
        loss = criterion(output, y_batch)
        loss.backward()
        optimizer.step()

for epoch in range(nb_epochs):
    train()

the input is “one-hot” vector and I tried changing its learning rate but the result is the same.

I’m not sure, it’s hard to spot bugs in code that you can’t run. Why do you do this:

x = x.select(0, maxlen-1).contiguous()

Don’t you want to return predictions for the whole sequence? It seems to me that you’re only taking the last output.

In fact I’m trying to (re)implement keras’s text generation example with PyTorch. In keras’s Recurrent layers, there is

  • return_sequences: Boolean. Whether to return the last output in the output sequence, or the full sequence.

and in the example, this is false, so I think taking only the last output is needed.

I’m not sure, I don’t know Keras. I’m just pointing it out (it might be easier to do x[-1] to achieve the same thing).

If you have the full code available somewhere I can take a look.

OK, thanks. Does

x = x[-1] i.e. x = x.select(0, maxlen-1).contiguous()

interfere in back propagation?

I uploaded my code here

How would they interfere? They both should be ok.

I’m not certain but I use only the last output and so I think this may give bad influence on back prop.
I’ll check the keras again. Thank you.

Finally I found that I misused the loss function torch.nn.CrossEntropyLoss. I changed the loss function to nn.NLLLoss(log_softmax(output), target) then the loss decreases as expected.

And you removed the softmax from the module, right?

Right. So now,

class Net(nn.module):
    ...
    def forward(self, x, hidden):
        x, hidden = self.rnn1(x, hidden)
        x = x.select(0, maxlen-1).contiguous()
        x = x.view(-1, hidden_size)
        x = F.relu(self.dense1(x))
        x = F.log_softmax(self.dense2(x))
        return x, hidden
...
criterion = nn.NLLLoss()
...
def train():
    model.train()
    hidden = model.init_hidden()
    for epoch in range(len(sentences) // batch_size):
        X_batch = var(torch.FloatTensor(X[:, epoch*batch_size: (epoch+1)*batch_size, :]))
        y_batch = var(torch.LongTensor(y[epoch*batch_size: (epoch+1)*batch_size]))
        model.zero_grad()
        output, hidden = model(X_batch, var_pair(hidden))
        loss = criterion(output, y_batch)
        loss.backward()
        optimizer.step()

Yup, that looks good! Note that you can now pass in hidden = None in the first iteration. The RNN will initialize a zero-filled hidden state for you. You might need to update pytorch though.

1 Like

I have a question about the the number of parameters in RNN. I defined a RNN layer and get its paramters. I thought the number of parameters in a RNN layer should differ from different input lengths. However, when I use parameters() to get its parameters, the number of parameters seemed similar to that of the RNN layer with only one time steps.

How to understand this fact? Thank you!

Your model is going to be the same, whatever is the length of your input.
In Torch we used to clone the model as many times as the time steps while sharing the parameters, because it is the same model, just over time.
The number of parameters will change when your input dimensionality will change (the size of x[t], for a given t = 1, ..., T), and not when T changes.

If it is still not clear, you can go over my lectures on RNNs (ref.).
And if it is still confusing, wait for the PyTorch video tutorials I’m currently working on.

I see. Thank you very much!

Hi,

Sorry for reopening this topic. I also just moved to PyTorch from Keras, and I am super confused about how RNN works.
I am confused about:

  1. I don’t understand what is the ‘batch’ mean in the context of PyTorch
  2. Since RNN can accept variable length sequences, can someone please make a small example about this?
  3. What is the difference between RNN cell and RNN?
    http://pytorch.org/docs/nn.html#torch.nn.RNNCell
    http://pytorch.org/docs/nn.html#rnn
  4. In RNN cell, why the documentation says the input is input (batch, input_size) , while in the example given in the documentation, the input is input = Variable(torch.randn(6, 3, 10)) ?

Thank you