Since you posted in nlp
, I assume you work with text.
A very common application is sentence classification (e.g., for sentence classification), where each sentence is a sequence of words. Let’s say you have a batch 3 sentences, each containing 10 words (nn.LSTM
and nn.GRU
require by default sequences of the same length; you can look up padding
and packing)
That means your batch has the shape (batch_size, seq_len)
, i.e., (3, 10)
with the numbers above. Not that each sentence/sequence is a vector of integers reflecting the index of a word in your vocabulary.
The next step is to push the batch through a nn.Embedding
layer to map words (represented by ther indices) to word vectors of size, say, 100
. The output shape after the embedding layer is then (batch_size, seq_len, embed_dim)
, i.e., (3, 10, 100)
with the numbers above.
This tensor can now serve as input for your nn.LSTM
or nn.GRU
which expect as input (batch_size, seq_len, input_size)
– not that by default, they actually expect (seq_len, batch_size, input_size)
; so either you transform()
for tensor or you define your RNN layer with batch_first=True
.
Anyway, embed_dim
, i.e., the size of your word vectors defines input_size
, 100 in the example above. Summing up
-
batch_size
is the number of sentences in your batch (e.g., 3) -
seq_len
is the number of items in your sequences such as words in a sentence (e.g., 10) -
input_size
is the size of the tensor/vector that represents a single(!) item in your sequence such as 100-dim word vectors for each word in a sentence.
The shape of inputs and outputs are very well defined; see, for example, for nn.LSTM
.