I want to train a RNN (LSTM) on time series data, doing backpropagation every step and keeping the time relationships from start to end.

I tried many things without success. If I set seq_len to the number of data points I have, the model trains quickly and keeps temporal relationships, but it only updates weights once per epoch.

The last thing I tried is setting seq_len and batch_size to 1, and passing the hidden states on every iteration:

```
hidden = None
for i in range(0, len(X)):
single_tick = X[i].view(1, X[i].shape[0], X[i].shape[1])
y_pred, hidden = net(single_tick, hidden)
loss = criterion(y_pred, y[i])
loss.backward(retain_graph=True)
optimizer.step()
train_loss_total += loss.data[0]
```

Note that I set retain_graph to True.

This kinda works, but it takes an very high amount of time every epoch, to the point where it’s almost unusable.

I want to know what is the common practice for this aparently simple task (keeping time relationships on long datasets). In Keras this works without having to do anything, so I imagine there is no technical limitation, but just a lack of knowledge from my part.

This is my model:

```
class Model(nn.Module):
def __init__(self, input_size, num_layers=2, hidden_size=256):
super(Model, self).__init__()
self.input_size = input_size
self.num_layers = num_layers
self.hidden_size = hidden_size
self.lstm = nn.LSTM(self.input_size, hidden_size=self.hidden_size, num_layers=self.num_layers, dropout=0.2, batch_first=True)
self.dense = nn.Linear(self.hidden_size, 1)
self.activation = nn.Sigmoid()
def forward(self, x, hidden=None):
batch_size = x.shape[0]
if hidden is None:
h0 = Variable(torch.randn(self.num_layers, batch_size, self.hidden_size))
c0 = Variable(torch.randn(self.num_layers, batch_size, self.hidden_size))
else:
(h0, c0) = hidden
output, hidden = self.lstm(x, (h0, c0))
output = output.view(batch_size, self.hidden_size)
output = self.activation(self.dense(output))
return output, hidden
```