Training and using RNNs

AndreaCogliati · March 13, 2017, 2:05pm

I’m working on annotating time series with recurrent neural networks (in particular, I’m trying to replicate Alex Graves’s experiments from his book Supervised Sequence Labelling with Recurrent Neural Networks, but I’m still a bit confused about the seq_len dimension in recurrent layers in PyTorch. As far as I understand, that length corresponds to the length of the unfolding in time for BPTT during training. Since PyTorch models have a dynamic computational graphs, I can use the trained model on sequences of different lengths. What happens if I do so? E.g., if I have a network with two LSTM hidden layers, what is the difference in terms of the state of the LSTM units and their output if I feed a trained network with a sequence of 100 time steps at once, compared to feeding the same network with the same time series but one sample at a time?

I understand there might be two questions here, one related to general LSTM networks, the other about specific PyTorch implementation.

apaszke · March 13, 2017, 7:36pm

It will be equivalent in terms of the output. You can use different seq_lengths with a single module, but passing in a larger tensor allows the backend to batch some operations, and that leads to 3-10x speedup in most cases.

Also, note that if you want to process each time step separately, we recommend using *Cell modules (batched ones use too much memory at the moment). Another trick is to forward the largest possible sequence you’re going to use at the beginning. This will allow the allocator to prepare large enough buffers, that will be able to be reused during the whole training.