I have a time series forecasting model that uses both normal (i.e. global) parameters like weights, and some per series parameters.
Additionally, I train a number of NNs concurrently - they are receiving different subsets of the dataset (allocation happens once an epoch).
So, to speed things up, I am using multiprocessing, one process per network. In the main program I create trainers, one per network, and pass them to the worker process.
I use .share_memory() on both global and per series parameters. I am passing the trainer to the worker. At the end of training, inside the worker function, I save the trainer state (trainer.state), pass it back to the main program and update the trainer state here, like:
But the accuracy of the fit is below the one I am getting on a single-threaded C++/Dynet. I am thinking that this may be because I use several trainers (one per net) to update the same per-series parameters.
In C++/Dynet I have a separate trainer dealing with per-series-parameters, but I do not see how I could make it in Pytorch, in other words: how to share a trainer in a multiprocessing?
A trainer is an object that does not support share_memory() function.
So, let me ask again: how to share a trainer?