so,there is one set of weight for both layers(LSTM instances) as a whole,but each layer(LSTM instance) have access only to one corresponding different part of it while training or inferring,right?
There in your code there is only one LSTM instance (self.lstm
) and since you use it twice it uses exactly the same weights both times.
1 Like
I got one more question.
I add a couple strings to the code(it works without errors) and got modules like this:
LSTMTagger(
(word_embeddings): Embedding(9, 5)
(lstm): LSTM(5, 9)
(lstm2): LSTM(9, 15)
(hidden2tag): Linear(in_features=15, out_features=3)
)
Embedding(9, 5)
LSTM(5, 9)
LSTM(9, 15)
Linear(in_features=15, out_features=3)
But befor i got like:
LSTMTagger(
(word_embeddings): Embedding(9, 6)
(lstm): LSTM(6, 9)
(lstm2): LSTM(6, 9)
(hidden2tag): Linear(in_features=9, out_features=3)
)
Embedding(9, 6)
LSTM(6, 9)
LSTM(6, 9)
Linear(in_features=9, out_features=3)
So as you can see here we got one lstm module, which output is 6, is fed to another which input is 9,and it still works.
I cant get why this example throws no errors and work well despite different input and output dimensions?
It looks a bit fishy for me.
Perhaps,it takes only first 6 values as input of second lstm model but i’m not sure/
- The second layer should have input_size = hidden_size_of_first_layer.
- The printed description of the model corresponds to the order you declare the layers in
__init__
butforward
can use the layers in different order. That is why the printed description can sometimes be incoherent even when the model still works.