I am trying to predict the next word using LSTM.
those are my layers:
# define layers
self.embedding = nn.Embedding(num_embeddings = vocab_size,
embedding_dim = input_length)
self.lstm = nn.LSTM(input_size = input_length,
hidden_size = hidden_dim,
num_layers = num_layers,
dropout = drop_prob,
batch_first = True, #batch_first= True, then the input and output tensors are provided as (batch, seq, feature)
bidirectional = False)
self.fc = nn.Linear(hidden_dim, output_size)
self.dropout = nn.Dropout(p = drop_prob)
self.LogSoftMax=nn.LogSoftmax(dim = 1) # meaning result size [batch_size x seq_len x output_size]
with the nn.NLLLoss .
the input to this network have size of [batch_size,seq_len]
and my output_size is the volcab_size = 10000.
the output of the network has size of [batch_size,seq_len,vocab_size ],
which i kinda get it since the last dim (volcab_size ) is the probability (or log probability) of the next word to be wordX.
what i dont understand is how to create the target. target size should be [batch_size, vocab_size ],
but this is not one-hot vector (right?). so in theory [batch_size, 1] should be enough (when in the last dim is the index in the vocabulary of the next word).
any way to do that? there is no meaning to use [batch_size, vocab_size] as target size.