I have 8 different models I can parallelize with 8 GPUs, they are independent and I want evaluate their loss/error/forward pass independently since that can be parallelized. What is the easiest way to parallelized this computation in pytorch?
Sounds like what you’ve got is more commonly called “trivially” parallel (or “embarrassingly parallel” https://en.wikipedia.org/wiki/Embarrassingly_parallel): You want to have different models work on the same data without talking to each other. That’s easy. Just start the run 8 different times, and preface each command-line call with CUDA_VISIBLE_DEVICES=X, where X is the GPU for that run.
In “data parallel” applications (https://en.wikipedia.org/wiki/Data_parallelism), different subsets of the dataset are sent to different nodes, which work on their subset of the dataset, and the results are typically combined later.
If you’re worried about 8 copies of the dataset taking up too much RAM on your machine so you want the 8 GPU threads to all share the same CPU data then… yea, that’s a neat idea. Not sure what the name for that computing paradigm is, maybe “model parallelism”; I doubt PyTorch has a pre-made method for this, but you should be able to do it using the .to(device) method in PyTorch 0.4. Thus you could send the different models to different GPUs, and you’ll also have to explicitly send each (copy of) part of the dataset to each GPU as well, e.g. in the midst of a batch training loop. Let me know if that works. …I might try this myself!
this is where I wish it were gotorch…parallelism code is so easy to write in go…its fine thanks I have already ran what you suggested, ur 20ish hours late Though I appreciate the sentiment and the additional idea I didn’t think of model paralelism