I couldn’t find a similar thread, so excuse me if I’m duplicating.
I have a tiny model, say MNIST, that I want to train 100 of. I have some large GPUs, and I can fit 6 or so of these models on a single device. Is there some good way to train these models concurrently without running 10 copies of the same script, or using the multiprocessing library from the python standard lib.
It seems like there could be some idiomatic way of doing this. Given the same data, train 10 models concurrently with different initializations on a single device. Without needing 10 redundant copies of Pytorch instantiated, taking up unnecessary memory.
Thanks for any pointers,