Hi all, I think I have a a misunderstanding of how to use lstms. I’ve read through all the docs and also a bunch of lstm examples.
I am trying to do something basic, which is just to take the output of an lstm and pass it through a linear layer, but the sizes dont seem to be coming out properly.
My batch size is 128 and that is what I expect at the final line for my forward, but instead I get back 22036.
class MyRNN(nn.Module): def __init__(self, embed_size, hidden_size, vocab_size, num_layers): super(MyRNN, self).__init__() self.embed = nn.Embedding(vocab_size, embed_size) self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True) self.linear = nn.Linear(hidden_size,4800) self.init_weights() def init_weights(self): """Initialize weights.""" self.embed.weight.data.uniform_(-0.1, 0.1) self.linear.weight.data.uniform_(-0.1, 0.1) self.linear.bias.data.fill_(0) def forward(self, features, captions, lengths): embeddings = self.embed(captions) print("embedding size:"+str(embeddings.size())) embeddings = torch.cat((features.unsqueeze(1), embeddings), 1) packed = pack_padded_sequence(embeddings, lengths, batch_first=True) rnn_features, _ = self.lstm(packed) print("rnn_features:"+str(rnn_features.data.size())) outputs = self.classifier(rnn_features) #output should be of size 128 * 4800, not 22036 * 4800 return outputs
Here are my print statement outputs:
captions size:torch.Size([128, 362])
captoins size:torch.Size([128, 302])
padded captions size:torch.Size()
embedding size:torch.Size([128, 302, 256])
packed sizetorch.Size([22036, 256])
It looks like the issue is that I dont understand pack_padded_sequence. The docs say output is “The returned Variable’s data will be of size TxBx*, where T is the length of the longest sequence and B is the batch size. If batch_first is True, the data will be transposed into BxTx* format.” But it seems like the output is just 21456??? Why is that?
What is the point of pack_padded_sequence? It seems optional to use for RNNs. I see some code that uses them and some that don’t. Is it better to use these sequences vs Tensors?
it seems if you use torch.nn.utils.rnn.pack_padded_sequence() then you don’t need to pass h_0 and c_0? hard to tell, the docs don’t really say.
And for the input of LSTM, from the docs it says “input (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.” Can someone explain what the difference between seq_len and input_size is?
Any help would be greatly appreciated, I’ve been stuck on this for a while, trying to fix it on my own as it seems it should be easy to fix, but everything i’ve tried does not work.