RNN for sequence prediction

1 & 4)
batch is the number of samples within the minibatch

The dimension corresponding to the batch vary, depending on the batch_first argument of the RNN modules.

By default you process samples with (timesteps, batch_samples, input_size), while if batch_first=True the RNN will consider sequence with a format of (batch_samples, timesteps, input_size), just like Keras does.

Check the documentation for pack_padded_sequences and pad_packed_sequences.

Basically you have to pass your input in a PackedSequence format, which contain sequence lengths information, and PyTorch’s RNN native modules will deal with the variable lengths without the need of masking explicitly.

RNNCell does the forward pass for a single time step of a sequence.
RNN applies the RNNCell forward pass to every time step of an input sequence -> this is your traditional RNN

@miguelvr Thank you for your reply.
Just to make sure I understand, you are talking about RNN layer
where you say that input (seq_len, batch, input_size) is equivalent to input (timesteps, batch, input_size). Am I correct?

you’re correct…

I edited my previous post to answer your other questions.

1 Like

Thank you very much @miguelvr , this is much clearer now

Hi, I have a small question that is relevant to @osm3000’s q3:
that is I am trying to figure out the different and relation between RNN and RNNCell, (or LSTM and LSTMCell)…
Say we assume that we only have 1 layer, according to @miguelvr’s answer and the documentation,
it seems like LSTMCell (or RNNCell) allows me to process each time step separately.
While LSTM (or RNN), we put entire sequence of input into it, and we got entire outputs.

If the above understands are correct, then my question is why I get totally different results when I tried to process a sequence of data.
I follow the simple example in the document, i.e.

rnn = nn.LSTMCell(10, 20) 
input = Variable(torch.randn(6, 3, 10))
hx = Variable(torch.randn(3, 20))
cx = Variable(torch.randn(3, 20))
output = []
for i in range(6):
     hx, cx = rnn(input[i], (hx, cx))

and I also use the same set of data (I didn’t random again but the same data), and put the entire input into the LSTM as follow:

lstm = nn.LSTM(10, 20) ## layer = 1

and I compare the output results from both strategies.
But they gave me totally different results…
I am wondering if it because the underlying implementation, or I use it wrongly?

thank you!


Hi @jdily, I encountered the same problem! Did you figure out? I posted my problem here.

Don’t we just do x[:,:,-1] to get the last output and then pass it to Dense layer?