I’m trying to implement the following paper: Population based training for a simple CIFAR classifier.
As a part of this I need to train multiple models, with different hyperparameters, in parallel (they will be fed the same data). Each of these models would then update a global dict with its validation accuracy as well as its parameters. These models would then periodically and asynchronously explore and exploit using this list.
Is there any way to simultaneously train different models (models with different hyperparameters), each on a separate GPU, in parallel? Additionally how can they be made to update a global list or dictionary? Communication doesn’t have to be synchronous.
multiprocessing will make it a bit hard to share a global dict (it is possible using managers, but it’s not very intuitive). You can try spawning a few threads first, and see if you are bottlenecked by Python’s GIL.
Thanks! Turns out multiprocess seems to be what I’m looking for. I checked out few tutorials regarding manager. Will give it a try.
I had a doubt though, would i need to specify GPUs or does the multiprocessing module automatically take care of distributing processes across it? Also, can I use the pool class? Will pytorch throw an error if I try to map to too many processes?
However what I want to do is run training functions in parallel and change the hyper parameters. Could you tell me how we can assign GPUs to functions and run them in parallel with multiprocess?