I am currently working on character transliteration. My input tensor shape is: (n_words, char indexes for each word). When I pass this tensor (after padding) into nn.Embedding, it gives me a 3D embedding tensor and an error
LSTMCell: Expected input to be 1-D or 2-D but received 3-D tensor.
- How should I think about
seq_length parameters for my problem?
- How do I use embedding vectors for each character in nn.LSTMCell?
Note: My dataset is a list of words in source and target language.
LSTM_Cell takes inputs c0 and h0 of batch_size, input_size. See: LSTMCell — PyTorch 1.11.0 documentation
If you wanted to pass in a sentence to an LSTM, you pass in each word of the sentence one at a time. And then your hidden layer starts with zeros and on later words, contains the hidden output of the previous word input.
If you’re looking for something more parallelized, try Transformer layers. See: Transformer — PyTorch 1.11.0 documentation
In case of language translation, how do I set
seq_length parameter when I am passing each word into nn.LSTMCell?
You’ll need to decide if you want to use char indexes or word indexes. You won’t be able to use both on the same layer. You could run two parallel layers, if you want both.
Right, let’s say I use char indexes as my problem is character transliteration. My first 5 words of training dataset are of the following lengths: [3, 4, 2, 5, 4]. Now I will be padding them to make all words of length 5. But how do I pass this
Character transliteration is converting one set of characters to a phonetic representation with another set of characters. Correct?
So why denominate by both words and character?
If, for some reason, you wanted the network to access both, you would need two parallel layers.
def forward(self, word_ind, char_ind, word_ind_h0, char_ind_h0):
word_ind, word_ind_h0=LSTMCell_word_layer(word_ind, word_ind_h0)
char_ind, char_ind_h0 =LSTMCell_char_layer(char_ind, char_ind_h0)
x=torch.cat([word_ind, char_ind], dim=1)
return x, word_ind_h0, char_ind_h0
I just typed the above on my phone and haven’t tested it. You’d need to put it into an nn.Module class(i.e. a model) for it to work. And then may need some debugging/tweaking. And add some relu layers, dropout, etc.
Let’s say I only pass characters. Not words. How do I use
You’ll need to encode your character data. Pytorch has examples for RNNs.
In the above example, you create one hot vectors from your input characters(i.e. A - Z: 26). Then the Linear layer output should be the size of the target characters (i.e. α - ζ: 24). Apply a softmax to the output and use crossentropyloss.
Then you train via size of (batch_size, current_char_one_hot). Each batch element should be a different example, and you iterate over those. For example, suppose our batch was of two words:
We would change the first letters of each to a one hot vector. For A and D
first_batch=torch.tensor([F.one_hot(0, num_classes=27)],[F.one_hot(3, num_classes=27])
And you can pad the end of each example with a designated one hot. So 26 letters would become a 27 length one hot vector, etc.
Last, don’t forget to .detach() the hidden layer between each iteration.
Seems I overlooked part of your question, regarding nn.Embedding.
You can use this instead of one_hot vectors, if you prefer. The function is similar. Only embed the part in the sequence you are sending to the model now.
Using our previous example, Apple and Dog.
You might send the first characters as such:
Yeah, this is where the issue lies. If my input
x is a 2D tensor of the shape:
[batch_size (n_words), seq_length (max_word_len_batch)], then the columns of this tensor will contain the respective char indexes. Now when I pass this into
nn.Embedding(vocab_size, embedding_dim), the output embedding results in a 3D tensor of the shape:
(batch_size, seq_length, embedding_dim). However,
nn.LSTMCell doesn’t take 3D tensor.
Sounds like you’re trying to pass it all at once instead of as an iterable sequence. If you want a parallelized model, look at Transformers.
But if you are going to use an RNN, like LSTM, your inputs should be each step of the sequence(i.e. each word or each character). That will give you a 2d tensor for each iteration.
So you’re saying if I use LSTM, I have to use a
for loop to complete a sequence?
Right. And then you pass the hidden layers back in to carry the data, or memory, through iterations.
Makes sense. Let me try this.