I’m working on annotating time series with recurrent neural networks (in particular, I’m trying to replicate Alex Graves’s experiments from his book Supervised Sequence Labelling with Recurrent Neural Networks, but I’m still a bit confused about the
seq_len dimension in recurrent layers in PyTorch. As far as I understand, that length corresponds to the length of the unfolding in time for BPTT during training. Since PyTorch models have a dynamic computational graphs, I can use the trained model on sequences of different lengths. What happens if I do so? E.g., if I have a network with two LSTM hidden layers, what is the difference in terms of the state of the LSTM units and their output if I feed a trained network with a sequence of 100 time steps at once, compared to feeding the same network with the same time series but one sample at a time?
I understand there might be two questions here, one related to general LSTM networks, the other about specific PyTorch implementation.