Running two models in two gpu respectiveily

Aiming_Du · October 2, 2020, 6:13am

I would like to run 2 models (each model runs on one gpu) in one script, I can run them in subprocesses and send model and tensors to a specific gpu by tensor.cuda(device_id) , so that these two models runs independently. 2 models are created successfully, and I can see they take gpu memory respectively. During inference, I got the following error:

transform: failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

I guess the problem may come from: for some functions it runs in the default device, and I have to write torch.cuda.set_device() to set the global(default) device, how can I work around this to run 2 model instances on each gpu?

JuanFMontesinos · October 2, 2020, 6:16am

You don’t need to use subprocesses. Cuda calls are asynchronous.
Besides you don’t need to use set_device
You can allocate the model and inputs by choosing the device when doing .cuda(idx) or .to('cuda:idx')

If you really want to do that, you have to use daemon threads.

Aiming_Du · October 2, 2020, 6:47am

Hi, thank you for your reply. I did use model.cuda(idx) and inputs.cuda(idx). Now I create only one model instance, and set model.cuda(1) and inputs.cuda(1), the exception is still there. I think somewhere it uses the default device which is gpu:0.
Here is some code (creating 10 instances):

class ModelA:
    def __init__(self,gpu=0):
        self.gpu = gpu
        model_path = 'model.pth'
        mixed_precision = True
        model, state = Model.load(filename=model_path)
        self.model = model.cuda(self.gpu)
        self.model = amp.initialize(model, None,
                           opt_level='O2' if mixed_precision else 'O0',
                           keep_batchnorm_fp32=True,
                           verbosity=0)
        self.model.eval()
  def infer(self,inputs):
        inputs = inputs.cuda(self.gpu)
         with torch.no_grad():
            results = self.model(inputs)
         results = results.cpu()
         return results

models = []
for i in range(10):
  models.append(ModelA(i%2)) #10 instances allocated into 2 valid gpus
#receiving requests and running inference with one of these 10 instance.

JuanFMontesinos · October 2, 2020, 7:09am

Are you using set_devices?
You shouldn’t use it this way.
Are you running other processes which involve gpu usage?

Does it work without amp?
Everything still in the main thread?

Aiming_Du · October 2, 2020, 7:28am

I don’t use set_devices, just copy model to gpu when creating the instance and copy input data to a specific gpu at each inference.
I am running this script on gpu:0 and gpu:1, and I do run other processes on gpu:2 with other scripts. I think they should be running independently.

The error is still there without using amp.
It is a flask service, and it receives and process the requests simultaneously. The models are created in the main thread, but flask processes requests in multiple threads, each threads get one model instances(put in a queue) to do infererence.

JuanFMontesinos · October 2, 2020, 8:17am

Soo couple of things.
Can you try to instantiate the model inside the thread?
Is each model being used lonely right?
Lastly, can you check if you are running OOM?

Aiming_Du · October 2, 2020, 10:27am

Hi, finally I found out the problem comes from some function running in torch.cuda.current_device(), so I work around this by adding the following line to make sure it runs in a context.

with torch.cuda.device(gpu_id):
    #some function must runs in device gpu_id

It works now. Thank you!