Hi all, I think I have a a misunderstanding of how to use lstms. I’ve read through all the docs and also a bunch of lstm examples.
I am trying to do something basic, which is just to take the output of an lstm and pass it through a linear layer, but the sizes dont seem to be coming out properly.
My batch size is 128 and that is what I expect at the final line for my forward, but instead I get back 22036.
class MyRNN(nn.Module):
def __init__(self, embed_size, hidden_size, vocab_size, num_layers):
super(MyRNN, self).__init__()
self.embed = nn.Embedding(vocab_size, embed_size)
self.lstm = nn.LSTM(embed_size, hidden_size, num_layers, batch_first=True)
self.linear = nn.Linear(hidden_size,4800)
self.init_weights()
def init_weights(self):
"""Initialize weights."""
self.embed.weight.data.uniform_(-0.1, 0.1)
self.linear.weight.data.uniform_(-0.1, 0.1)
self.linear.bias.data.fill_(0)
def forward(self, features, captions, lengths):
embeddings = self.embed(captions)
print("embedding size:"+str(embeddings.size()))
embeddings = torch.cat((features.unsqueeze(1), embeddings), 1)
packed = pack_padded_sequence(embeddings, lengths, batch_first=True)
rnn_features, _ = self.lstm(packed)
print("rnn_features:"+str(rnn_features.data.size()))
outputs = self.classifier(rnn_features[0])
#output should be of size 128 * 4800, not 22036 * 4800
return outputs
Here are my print statement outputs:
captions size:torch.Size([128, 362])
captoins size:torch.Size([128, 302])
padded captions size:torch.Size([22036])
embedding size:torch.Size([128, 302, 256])
packed sizetorch.Size([22036, 256])
rnn_features:torch.Size([22036, 512])
It looks like the issue is that I dont understand pack_padded_sequence. The docs say output is “The returned Variable’s data will be of size TxBx*, where T is the length of the longest sequence and B is the batch size. If batch_first is True, the data will be transposed into BxTx* format.” But it seems like the output is just 21456??? Why is that?
What is the point of pack_padded_sequence? It seems optional to use for RNNs. I see some code that uses them and some that don’t. Is it better to use these sequences vs Tensors?
it seems if you use torch.nn.utils.rnn.pack_padded_sequence() then you don’t need to pass h_0 and c_0? hard to tell, the docs don’t really say.
And for the input of LSTM, from the docs it says “input (seq_len, batch, input_size): tensor containing the features of the input sequence. The input can also be a packed variable length sequence.” Can someone explain what the difference between seq_len and input_size is?
Any help would be greatly appreciated, I’ve been stuck on this for a while, trying to fix it on my own as it seems it should be easy to fix, but everything i’ve tried does not work.