Training multiple independent moddels at once

I have 50 completely independent models I want to train in parallel on 8 gpus. I have the model training run in a script that I run like

python device_num

The simple way to do this is

for group in groups:
    processes = [subprocess.POpen(f'python {device}'.split()) for device in range(8)]
    [p.wait() for p in process]

where groups, are the 50 processes split into groups of 8.

The downside of this is some models take longer than others to train and all models need to finish before it moves to the next group.

I was hoping to do something like multiprocess.spawn, but I need the last process to return the device number so it is clear which device is open to run on. I tried using Queue and Process from multiprocessing but I can’t get more than 1 process to run at once.

Any help would be very appreciated. Thanks