I am currently working on character transliteration. My input tensor shape is: (n_words, char indexes for each word). When I pass this tensor (after padding) into nn.Embedding, it gives me a 3D embedding tensor and an error LSTMCell: Expected input to be 1-D or 2-D but received 3-D tensor.
How should I think about batch_size and seq_length parameters for my problem?
How do I use embedding vectors for each character in nn.LSTMCell?
Note: My dataset is a list of words in source and target language.
If you wanted to pass in a sentence to an LSTM, you pass in each word of the sentence one at a time. And then your hidden layer starts with zeros and on later words, contains the hidden output of the previous word input.
You’ll need to decide if you want to use char indexes or word indexes. You won’t be able to use both on the same layer. You could run two parallel layers, if you want both.
Right, let’s say I use char indexes as my problem is character transliteration. My first 5 words of training dataset are of the following lengths: [3, 4, 2, 5, 4]. Now I will be padding them to make all words of length 5. But how do I pass this seq_length into nn.LSTMCell?
I just typed the above on my phone and haven’t tested it. You’d need to put it into an nn.Module class(i.e. a model) for it to work. And then may need some debugging/tweaking. And add some relu layers, dropout, etc.
In the above example, you create one hot vectors from your input characters(i.e. A - Z: 26). Then the Linear layer output should be the size of the target characters (i.e. α - ζ: 24). Apply a softmax to the output and use crossentropyloss.
Then you train via size of (batch_size, current_char_one_hot). Each batch element should be a different example, and you iterate over those. For example, suppose our batch was of two words:
Apple
Dog
We would change the first letters of each to a one hot vector. For A and D
Seems I overlooked part of your question, regarding nn.Embedding.
You can use this instead of one_hot vectors, if you prefer. The function is similar. Only embed the part in the sequence you are sending to the model now.
Yeah, this is where the issue lies. If my input x is a 2D tensor of the shape:[batch_size (n_words), seq_length (max_word_len_batch)], then the columns of this tensor will contain the respective char indexes. Now when I pass this into nn.Embedding(vocab_size, embedding_dim), the output embedding results in a 3D tensor of the shape: (batch_size, seq_length, embedding_dim). However, nn.LSTMCell doesn’t take 3D tensor.
Sounds like you’re trying to pass it all at once instead of as an iterable sequence. If you want a parallelized model, look at Transformers.
But if you are going to use an RNN, like LSTM, your inputs should be each step of the sequence(i.e. each word or each character). That will give you a 2d tensor for each iteration.