Binary Classification of Time Series: When to reset LSTM hidden/cell state?

ppmt · April 1, 2020, 7:35am

I am working on a binary classifier for time series data with one feature. However, I am unsure when exactly to reset the hidden and cell states and why.

Currently I am using the following classifier:

class LSTMClassifier(nn.Module):
    def __init__(self,
                 input_features_size=1,
                 hidden_layer_size=30,
                 num_layers=1,
                 output_size=2):
        super().__init__()
        self.lstm = nn.LSTM(input_size=input_features_size,
                            hidden_size=hidden_layer_size,
                            num_layers=num_layers,
                            batch_first=True)
        self.linear = nn.Linear(hidden_layer_size, output_size)
        # number of features in hidden layer
        self.hidden_layer_size = hidden_layer_size
        # number of input features (i.e., features per sequence point)
        self.input_features_size = input_features_size
        # number of layers
        self.num_layers = num_layers
        # size of batches used in gradient descent
        self.batch_size = None
        # tuple with hidden state (a.k.a. short term memory) and cell state (a.k.a. long term memory)
        self.hidden_cell = (None, None)

    def init_hidden(self, batch_size):
        self.batch_size = batch_size
        self.hidden_cell = (torch.zeros(self.num_layers, batch_size, self.hidden_layer_size),
                            torch.zeros(self.num_layers, batch_size, self.hidden_layer_size))

    def forward(self, x_batch):
        batch_size = x_batch.shape[0]
        self.init_hidden(batch_size)
        lstm_out, self.hidden_cell = self.lstm(x_batch, self.hidden_cell)
        predictions = self.linear(lstm_out[:, -1, :])
        return predictions

The hidden and cell state get reset before every forward pass. The reasoning behind this is that the individual time series instances are seen as independent of each other. Thus, the ‘kept information’ from the previous instance is not relevant for the next one etc. I suppose the actual learning of what information to keep in hidden and cell state when parsing an instance is done by updating the weights of the individual gates.

Is this reasoning correct?

I have found examples resetting the hidden state only once before training, every epoch, and before every forward pass on different sites leaving me quite confused.

Cheers,
ppmt