Should we avoid using one-hot encoding in neural networks?


I am trying to reproduce the results shown in this tutorial. However, using torch.nn.RNN (unlike in the tutorial, the author used the RNN he wrote by himself) outputs score and loss are both constantly 0.


Classify names into their origins (languages) using RNN.

Data Preprocessing

I have a bunch of text files where each text file includes many names corresponding to the same language (label). I one-hot encoded all the names and form a dataset and each entry looks like ((L, D), 1), where L is the number of characters in the name, D is the dimension of one-hot representation and 1 corresponds to class label.

In my case, D is 57 and there are 18 classes. So for name like “Mona”, the corresponding data is of shape ((4, 57), 1).

Model and Training Loop

class RNNNameClassifier(nn.Module):
  def __init__(self, input_size, hidden_size, output_size, **kwargs):
    super(RNNNameClassifier, self).__init__()
    self.hidden_size = hidden_size

    self.hidden = self.init_hidden()
    self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size, **kwargs)
    self.classifier = nn.Linear(hidden_size, output_size)

  def init_hidden(self):
    return torch.zeros(1, 1, self.hidden_size)
  def forward(self, embedding):
    output, self.hidden = self.rnn(embedding, self.hidden)
    output = self.classifier(output)
    output = F.log_softmax(output, dim=1)

    return output
model = RNNNameClassifier(input_size=EMBEDDING_DIM, hidden_size=HIDDEN_DIM, output_size=OUTPUT_DIM)
optimizer = optim.Adam(model.parameters(), lr=LR)
criterion = nn.NLLLoss()
loss_list = list()
for epoch in range(MAX_EPOCH):
  print("epoch: %d / %d" % (epoch + 1, MAX_EPOCH))
  for i, (X_train, y_train) in enumerate(train_dataset):
    X_train = torch.tensor(X_train, dtype=torch.float32)
    y_train = torch.tensor(y_train, dtype=torch.int64)

    model.hidden = model.init_hidden()

    score = model(X_train.view(X_train.size(0), 1, -1))
    loss = criterion(input=score[-1], target=torch.tensor([y_train]))

    if (i + 1) % PRINT_FREQ == 0:
      print("\tloss: %.5f" % loss_list[-1])


I am not sure the issue arises from some errors in my implementation or something else. Specifically, one potential problem might be I should not use one-hot encoding. Maybe some fine-tuning of word2vec is required.

Do I understand it correctly? Thank you in advance!