I usually use small backbones for my work, so if my machine has 4 gpus, I can tune hyperparameters by training 4 independent model. Thus I prepare batch only once on cpu and send it to all subprocesses through the queue like that
ctx = mp.get_context('fork') p = ctx.Process(target=secondary_training, args=(cfg, queue)) p.start()
It worked well before torch 1.12.
Now it raises an error saying that I should use another context instead of ‘fork’. But I do not understand how to configure it for this case, where models are independent with different hyperparameters and even backbones but the training data are same.
Could you please tell me what parallel method suits here, if ‘fork’ are not supported any more?