Hi,
I am new to working with LSTMs and I have multiple questions that I have not found questions to. If you have answers to any of the following please respond.
(1):
Given input of shape (batch_size, seq_length, 1)
passed through
lstm = LSTM(input_size=input_size, hidden_size=hidden_size, batch_first=True)
, the output (output, _ = lstm(input)
) is of shape (batch_size, seq_len, hidden_size)
.
(1.1):
If I then want to make a prediction on this using fully connected layer, if I understand correctly, I only need the final feature in the sequence. Is this correct?
(1.2):
So this would be fully_connected(output[:,-1, :])
. Is this correct?
(2):
If sequences in a batch have different lengths, I can store the sequences in a list: batch = [sequence_1, ..., sequence_n]
, where sequence_i
are tensors of shape (seq_length_i, 1)
. This can be wrapped in the following: batch = torch.nn.utils.rnn.pack_sequence(batch)
. This can then be passed through the LSTM described in Q(1). The output, however, is still of type PackedSequence and I want the output features of each sequence in the batch. I have written the following code:
pad_packed_sequence(a)[0][-1, :, :]
(2.1):
my interpretation of the above code is that I am extracting the final output of the entire sequence of the lstm for each sequence in the batch (plus padding). Is this right?
(2.2):
More theoretical, if I pass a packed sequence through an LSTM, will each sequence be passed independently? I.e. will my output be equivalent to the output if I had passed each sequence independently and then concatenate the outputs?
(3):
If I have I set of sequences with different lengths, which I want to process with 1D convolution before passing through an LSTM, is there any way I can pass I batch of sequences through such a network? Something of the form: rnn.pack_sequence([seq_1, ..., seq_n]) -> Conv1D -> LSTM -> Linear
. Of course an PackedSequence can not be passed through Covn1D so is there an alternative?
(4):
If I have an architecture that looks like the following: x -> LSTM -> LSTM -> Linear -> prediction
, is it common practice to apply an activation function on the outputs of the LSTMs?
Thanks in advance!!
P.s. if there are any tutorials out there I’d also be happy if you see those. I have done some digging it seems like there are very few resources. It would be great if there were more end to end tutorials by PyTorch.