I want to train a RNN (LSTM) on time series data, doing backpropagation every step and keeping the time relationships from start to end.
I tried many things without success. If I set seq_len to the number of data points I have, the model trains quickly and keeps temporal relationships, but it only updates weights once per epoch.
The last thing I tried is setting seq_len and batch_size to 1, and passing the hidden states on every iteration:
hidden = None for i in range(0, len(X)): single_tick = X[i].view(1, X[i].shape, X[i].shape) y_pred, hidden = net(single_tick, hidden) loss = criterion(y_pred, y[i]) loss.backward(retain_graph=True) optimizer.step() train_loss_total += loss.data
Note that I set retain_graph to True.
This kinda works, but it takes an very high amount of time every epoch, to the point where it’s almost unusable.
I want to know what is the common practice for this aparently simple task (keeping time relationships on long datasets). In Keras this works without having to do anything, so I imagine there is no technical limitation, but just a lack of knowledge from my part.
This is my model:
class Model(nn.Module): def __init__(self, input_size, num_layers=2, hidden_size=256): super(Model, self).__init__() self.input_size = input_size self.num_layers = num_layers self.hidden_size = hidden_size self.lstm = nn.LSTM(self.input_size, hidden_size=self.hidden_size, num_layers=self.num_layers, dropout=0.2, batch_first=True) self.dense = nn.Linear(self.hidden_size, 1) self.activation = nn.Sigmoid() def forward(self, x, hidden=None): batch_size = x.shape if hidden is None: h0 = Variable(torch.randn(self.num_layers, batch_size, self.hidden_size)) c0 = Variable(torch.randn(self.num_layers, batch_size, self.hidden_size)) else: (h0, c0) = hidden output, hidden = self.lstm(x, (h0, c0)) output = output.view(batch_size, self.hidden_size) output = self.activation(self.dense(output)) return output, hidden