Hi,

I have a long sequence that requires relatively long memory.

I break each sequence to consecutive parts that are fed to the network in the original order. I want to keep the hidden state at the end of each batch, so it would be the initial hidden state to the next batch.

The following code describes my training loop.

Is this the correct way to do it?

for epoch in range(N_epochs):

model.hidden = model.init_hidden(bs=batch_size)

model.last_hidden = None

start_idx = np.arange(0,samples.shape[1],sub_step)

for s_idx in start_idx:

sub_sample = X_batch[:,s_idx:s_idx + sub_step]

sub_target = Y_batch[:,s_idx:s_idx + sub_step]

sample_v = torch.torch.autograd.Variable(sub_sample)

target_v = torch.autograd.Variable(sub_target)

```
if model.last_hidden is not None:
model.hidden = [torch.autograd.Variable(h) for h in model.last_hidden]
net_output = model(sample_v)
loss = loss_function(net_output, target_v)
model.zero_grad()
loss.backward()timizer.step()
_ = model(sample_v)
#save last hidden state
model.last_hidden = [h.data for h in model.hidden]
```