Assign two models to two GPUs and run concurrently

Hi community,

Let’s say I have 2 CNN models and I want to assign each of the models to a GPU (2 GPUs on my device). Then I want to make these 2 models run concurrently. How may I do that? I tried lots of methods, but they are all running sequentially. Any suggestion?

Your approach should be possible, but assuming you are trying to drive both GPUs from a single process, the CPU might not be able to run ahead and schedule the kernels fast enough.
Also, if you have any synchronization points, the CPU will of course be blocked and would have to wait until it can continue with the program execution.
The easiest way would probably be to launch different processes (scripts) or make sure that you are not blocking the CPU in your single script.

I made two models to work on two GPUs concurrently just now. I am so excited.

import torch
import torch.nn as nn
import torchvision
from concurrent.futures import ThreadPoolExecutor, wait
import time

device_0 = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device_1 = torch.device("cuda:1" if torch.cuda.is_available() else "cpu")

start_time = time.time()

dense = torchvision.models.densenet121(pretrained=False).to(device_0)
rest = torchvision.models.resnet101(pretrained=False).to(device_1)

dummy1 = torch.ones((200,3,32,32)).to(device_0)
dummy2 = torch.ones((200,3,32,32)).to(device_1)

def train(model,value,epoch):
    ti = time.perf_counter()
    for _ in range(epoch):
        ret = model(value)
    to = time.perf_counter()
    return ret.shape, to-ti

output = []

with ThreadPoolExecutor() as executor:
    futures = []
    futures.append(executor.submit(train, dense, dummy1, 1000))
    futures.append(executor.submit(train, rest, dummy2, 1000))
    complete_futures, incomplete_futures = wait(futures)
    for f in complete_futures:

elapsed = (time.time() - start_time)
print(f"Total time of execution {round(elapsed, 4)} second(s)")
print("Output is:",output)
(torch.Size([200, 1000]), 25.267832748999353)
(torch.Size([200, 1000]), 28.169526882003993)
Total time of execution 28.7672 second(s)
Output is: [(torch.Size([200, 1000]), 25.267832748999353), (torch.Size([200, 1000]), 28.169526882003993)]