1 & 4) batch is the number of samples within the minibatch
The dimension corresponding to the batch vary, depending on the batch_first argument of the RNN modules.
By default you process samples with (timesteps, batch_samples, input_size), while if batch_first=True the RNN will consider sequence with a format of (batch_samples, timesteps, input_size), just like Keras does.
Check the documentation for pack_padded_sequences and pad_packed_sequences.
Basically you have to pass your input in a PackedSequence format, which contain sequence lengths information, and PyTorchâs RNN native modules will deal with the variable lengths without the need of masking explicitly.
RNNCell does the forward pass for a single time step of a sequence.
RNN applies the RNNCell forward pass to every time step of an input sequence -> this is your traditional RNN
@miguelvr Thank you for your reply.
Just to make sure I understand, you are talking about RNN layer http://pytorch.org/docs/nn.html#rnn
where you say that input (seq_len, batch, input_size) is equivalent to input (timesteps, batch, input_size). Am I correct?
Hi, I have a small question that is relevant to @osm3000âs q3:
that is I am trying to figure out the different and relation between RNN and RNNCell, (or LSTM and LSTMCell)âŚ
Say we assume that we only have 1 layer, according to @miguelvrâs answer and the documentation,
it seems like LSTMCell (or RNNCell) allows me to process each time step separately.
While LSTM (or RNN), we put entire sequence of input into it, and we got entire outputs.
If the above understands are correct, then my question is why I get totally different results when I tried to process a sequence of data.
I follow the simple example in the document, i.e.
and I also use the same set of data (I didnât random again but the same data), and put the entire input into the LSTM as follow:
lstm = nn.LSTM(10, 20) ## layer = 1
and I compare the output results from both strategies.
But they gave me totally different resultsâŚ
I am wondering if it because the underlying implementation, or I use it wrongly?