I’ve written a basic LSTM classifier and a few more complex models based of this basic code. This basic model should take in a batch of sequences of size [200 x 128] and assign to each sequence one of 6 classes. I’m fairly sure this is correct and the model learns on the dataset. I’ve been looking around at other LSTM implementations and have seen that in some cases there are a few things included which I haven’t and I’m worried I’m missing something that could be effecting model training.
The basic model is as follows:
import torch import torch.nn as nn class Net(nn.Module): def __init__(self, num_classes=6, hidden_size=256, steps=200, bidirectional=False): super(Net, self).__init__() self.bidirectional = bidirectional self.hidden_size = hidden_size self.LSTM_one = nn.Sequential( nn.LSTM(input_size=128, hidden_size=hidden_size, num_layers=1, batch_first=True, bidirectional=bidirectional) ) self.classifier = nn.Sequential( nn.Dropout(0.2), nn.Linear(in_features=steps*hidden_size, out_features=500), nn.ReLU(inplace=True), nn.Dropout(0.2), nn.Linear(in_features=500, out_features=num_classes) ) def forward(self, x): x, hidden = self.LSTM_one(x) x = torch.flatten(x, 1) x = self.classifier(x) return x, hidden
I’ve seen in other places that a separate init function is called that initalizes the hidden state of the LSTM. Something like:
def init_hidden(self): return (torch.zeros(1 + int(self.bidirectional), 1, self.hidden_size), torch.zeros(1 + int(self.bidirectional), 1, self.hidden_size))
and further ones that pass a
hidden argument to the
forward method like this:
def forward(self, inputs, hidden): output, hidden = self.lstm(inputs.view(1, 1, self.input_size), hidden) return output, hidden
So my question is: are these two addition steps that my LSTM model is missing necessary or have they been deprecated?