I don’t understand why the parameter
seq_len is called this way. As far as I know this is the number of sequences.
seq_len sounds like a sequence length, which is completely different thing.
Honestly the all terminology in documentation of RNN looks quite strange. Let’s take an example:
Description of the input to RNN (https://pytorch.org/docs/master/generated/torch.nn.RNN.html):
input of shape (seq_len, batch, input_size): tensor containing the features of the input sequence.
Do you see that? Tensor containing the FEATURES of the input sequence. So where do I need to provide the FEATURES? Because I only see following:
(seq_len, batch, input_size) - nothing about features.
In my opinion it should be
(number_of_sequences, number_of_elements, number_of_features), where usually we have
number_of_sequences sequences which are
number_of_elements long each having
number_of_features features isn’t it?
Sorry I am very confused, but maybe I don’t understand something so I would appreciate if someone can clarify.
seq_len is indeed the length of the sequence such as the number of words in a sentence or the number of characters in a string.
input_size reflects the number of features. Again, in terms of sequences being words in a sentence, this would be the size of the word vectors (e.g, 300). Whatever the number of features is, that will be your
I full example: Let’s say your batch has 32 sentences of the same length of 50 words (e.g., with padding to ensure equal lengths), and you use word vectors (e.g. word2vec or GloVe) of size 300. then you get
batch_size = 32
seq_len = 50
input_size = 300
By default, RNNs in PyTorch expect as input shape
(seq_len, batch_size, input_size). But if you define an RNN with
batch_first=True, the expected input shape is
(batch_size, seq_len, input_size). It’s really just a matter of preference and convenience.
Is it the same case for output? If I define
batch_first=True, do I get the batch first also in output?
Nope. It works for the input in that way, but not for the output. From what I understand of this practice is – its just for convenience. Otherwise you would have to put unnecessary transpose statements and lengthen your code. The output fashion is dictated in the documentation as you can see in the link you provided.
Hope that helps.