Run two nets in parallel on single machine with multiple GPUs

Hi, I want to run 2 different networks that take the same input in parallel, with multiple gpus.
The main purpose is to achieve inference speed up. So I did:

    inputs = torch.randn(1, 3, 256, 256)
    inputs1 = inputs.to('cuda:0')
    inputs2 = inputs.to('cuda:1')
    model1 = model1.to('cuda:0')
    model2 = model2.to('cuda:1')

    s1 = torch.cuda.Stream()
    s2 = torch.cuda.Stream()
    torch.cuda.synchronize()

    i = 0
    time_spent = []
    while i < 1000:
        with torch.no_grad():
            start_time = time.time()
            with torch.cuda.stream(s1):
                _ = model1(inputs1)
            with torch.cuda.stream(s2):
                _ = model2(inputs2)

        if device == 'cuda':
            torch.cuda.synchronize()
        if i > 100:
            time_spent.append(time.time() - start_time) 

However, when I measure the average speed, I did not see any speed improvement compared to running model1 and model2 in a sequential manner on a single gpu. Am I missing something here? I am running the test on ubuntu 18.04 with two RTX-2080TIs. Many thanks!

Depending on the model and thus the workload, the CPU might not be able to run ahead and schedule the kernel launches fast enough.
You could profile it using e.g. Nsight Systems and check, if the kernels are overlapping or if they are so short, that they are executed “sequentially” on these two devices.

1 Like