Understanding nn.NLLLoss target size

I am trying to predict the next word using LSTM.

those are my layers:

# define layers
    self.embedding = nn.Embedding(num_embeddings = vocab_size, 
                         embedding_dim = input_length) 
    self.lstm = nn.LSTM(input_size = input_length, 
                      hidden_size = hidden_dim,
                      num_layers = num_layers,
                      dropout = drop_prob,
                      batch_first = True, #batch_first= True, then the input and output tensors are provided as (batch, seq, feature)
                      bidirectional = False)
    self.fc = nn.Linear(hidden_dim, output_size)
    self.dropout = nn.Dropout(p = drop_prob)
    self.LogSoftMax=nn.LogSoftmax(dim = 1) # meaning result size [batch_size x seq_len x output_size]

with the nn.NLLLoss .

the input to this network have size of [batch_size,seq_len]
and my output_size is the volcab_size = 10000.

the output of the network has size of [batch_size,seq_len,vocab_size ],
which i kinda get it since the last dim (volcab_size ) is the probability (or log probability) of the next word to be wordX.

what i dont understand is how to create the target. target size should be [batch_size, vocab_size ],
but this is not one-hot vector (right?). so in theory [batch_size, 1] should be enough (when in the last dim is the index in the vocabulary of the next word).
any way to do that? there is no meaning to use [batch_size, vocab_size] as target size.