I’ve just implemented a biLSTM+CRF neural network on gpu of joint models for POS tagging and dependency parsing and the reault turns to be great.
For achieving a better precision, I need to train with a larger dataset, so I implemented multiprocessing and it works well with a dataset of 500 thousand sentences(the size of input file is 50M), the precision can achieve 80%. But when I try training with a dataset of a million sentences, it occurs that:
Firstly, the initial error rate is 0.97 and every goes well before 500 thousand sentences, and error rate goes down to about 0.2.
Then, error rate goes back to 0.97 after training 513000 sentences and never seems to go down until the end of training. Neither errors nor warning occurs, the precision of the final model is only 2.3%, it is the same as a model newly created. It seems that the model reinitializes after 513000 sentences and never update its parameters again.
I spawn a process to record the cpu and gpu usage, memory usage, nothing special happens. I tried starting train at the 512000th sentence and it works well, so the problem is not about the training data.
I just wondering that, since my model already works with a relatively large dataset(500 thousand sentences) and I use generator to load data so it woudn’t be an issue of memory usage. Why it won’t work with a millions sentences? Are there any latent limitations or any parameters I need to take care of?
Thanks a lot for reading my post. If you have some ideas of this issue or have the same problem, it will be grateful if you can share with me