I am trying to predict the next word using LSTM.
those are my layers:
# define layers self.embedding = nn.Embedding(num_embeddings = vocab_size, embedding_dim = input_length) self.lstm = nn.LSTM(input_size = input_length, hidden_size = hidden_dim, num_layers = num_layers, dropout = drop_prob, batch_first = True, #batch_first= True, then the input and output tensors are provided as (batch, seq, feature) bidirectional = False) self.fc = nn.Linear(hidden_dim, output_size) self.dropout = nn.Dropout(p = drop_prob) self.LogSoftMax=nn.LogSoftmax(dim = 1) # meaning result size [batch_size x seq_len x output_size]
with the nn.NLLLoss .
the input to this network have size of [batch_size,seq_len]
and my output_size is the volcab_size = 10000.
the output of the network has size of [batch_size,seq_len,vocab_size ],
which i kinda get it since the last dim (volcab_size ) is the probability (or log probability) of the next word to be wordX.
what i dont understand is how to create the target. target size should be [batch_size, vocab_size ],
but this is not one-hot vector (right?). so in theory [batch_size, 1] should be enough (when in the last dim is the index in the vocabulary of the next word).
any way to do that? there is no meaning to use [batch_size, vocab_size] as target size.