I have used the pytorch basic example for sequence generation (see https://github.com/pytorch/examples/tree/master/time_sequence_prediction). I would like to create a version of this problem I could extend to bigger problems that would possibly not fill in memory. I have therefore re-created a notebook that would load the data in batches and uses an Adam optimizer.
My findings are:
- the training seems to behave well, however the RMSE decreases without my prediction for the next since time step to go in the right direction. The model is not really learning the next step but instead more or less copies the input, giving a rather low RMSE since I give the value at [t-n; t-1] to predict t with n being my BPTT. A visualization of the prediction show them clearly closer to the input than the target.
- Using the model for generation does not work at all. I guess this is because even point 1. does not work properly (it should at least predict better than copying the input). I tried a couple of things, including adding noise to the input to force the model to rely on its hidden state more. I also added a “teacher forcing” parameter allowing the model to learn using its own output as an input, hoping the generation task would improve.
So far nothing seems to work quite right, do you have any idea what I could try to get this simple problem to work?