Hi am using this implementation of DeepLab for PyTorch and would like to perform hyperparameter optimization by training multiple models at the same time.
This implementation uses nn.DataParallel
to train a model on multiple GPUs.
I start one training process after executing export CUDA_VISIBLE_DEVICES=0,1
. When I want to start the other training process, I get two different errors depending on the GPU ids in CUDA_VISIBLE_DEVICES
.
If I type export CUDA_VISIBLE_DEVICES=0,1,2,3
and train the second model on GPU 2 and 3, I receive:
RuntimeError: module must have its parameters and buffers on device cuda:2 (device_ids[0]) but found one of them on device: cuda:0
On the other hand if I execute export CUDA_VISIBLE_DEVICES=2,3
I receive AssertionError: Invalid device id
.
How can I train two models at the same time while another process already loaded the input data in a GPU.
Do I have to specify on which device the model should run using .to_device(cuda:x)
?
Yes, I would recommend to use a single script with the DataLoader
, create multiple model, push each one to the desired device via to('cuda:id')
and just pass the data to each model.
Since the training is done on different devices, it should be executed in parallel.
Your approach of running multiple scripts with CUDA_VISIBLE_DEVICES
would make it unnecessary complicated to share the data between these processes.
3 Likes
Thank you. As soon as I didn’t use CUDA_VISIBLE_DEVICES
and specified ‘.to(device)’ I can train multiple models, each on multiple GPUs, at the same time.
It looks like _check_balance(device_ids)
somehow gets the device id 0 even though nn.DataParallel(model, device_ids=[2,3])
has GPU IDs 2 and 3. That was the problem why using CUDA_VISIBLE_DEVICES=2,3
was resulted in an AssertionError: Invalid device id
.
Hello, Have you solved this problem? I am new at this field. I would like to inference two models on two gpus with same input. Which method do you recommend? cause I saw Distributed Data Parallel is only available on inferencing only one model on multiple gpus.
1 Like