Hello,
I’m trying to train an LSTM network with a fully connected layer on top of it. But I am facing some issues because I’m not so sure if my model is correctly written, or my training procedure is wrong.
The task is a binary classification with some sequential data of variable length, the batch is a tensor of size torch.Size([32, 58735, 49])
, for example, where 32 is the batch size, 58735 is the length of the biggest sequence, and 49 is the number of features of time t
of the sequence. I also pass a list of lengths
to the model with size torch.Size([32])
, equal to the number of examples (i.e., batch size) to use the pack_padded_sequence
function, because of the variable length of the sequences. My model is written as:
class LSTM(nn.Module):
def __init__(self, input_size, hidden_size):
super(LSTM, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.lstm = nn.LSTM(self.input_size, hidden_size, batch_first=True)
self.fc = nn.Linear(self.hidden_size, 1)
def forward(self, x, lengths):
bs = x.size(0)
self.ho = torch.zeros((1, bs, self.hidden_size), device=x.device)
self.co = torch.zeros((1, bs, self.hidden_size), device=x.device)
input_lengths, perm_idx = lengths.sort(0, descending=True)
x = x[perm_idx][:, :input_lengths.max()]
x = pack_padded_sequence(x, input_lengths, batch_first=True)
lstm_out, (self.ho, self.co) = self.lstm(x, (self.ho, self.co))
lstm_out, lengths = pad_packed_sequence(lstm_out)
x = self.fc(self.ho.squeeze())
return x.squeeze()
And the training procedure can be seen here with this gist
The problem is that when training with exactly this procedure and this LSTM model, the training accuracy and val accuracy stagnates since the beginning to the end with 0.7730 and 0.2699 accuracies respectively. Since this is the first time I make use of nn.LSTM and pack_padded_sequence, I’m not so sure where the problem could lie in.