Multiprocessing with GPU less optimal?

Hi, I’m trying to use multiprocessing for my model processing.
The reason why I feel like multiprocessing would be a good fit is that running the process, each individual task is finished extremely fast, so I’m hoping to distribute this work as there is extra bandwidth on the GPU and machine.
When I try to incorporate multiprocessing, processing slows down dramatically.

Without multiprocessing a GPU is at 8GB/10GB of usage.
When using multiprocessing, the GPU hits around 1.5GB/10GB of usage.
Seems odd. Any help is appreciated :slight_smile:
I’ve attached a skimmed version of what I am currently using.

from torch.multiprocessing import Pool
def model_process(chunk, segment):
    img = segment['image']
    img = img.to(device, dtype=torch.float)
    output = model(img)
    output = F.interpolate(output, (N, M))
    for i in range(len(output)):
        x = output[i, 0, :, :].detach().cpu().numpy()
        y = output[i, 1, :, :].detach().cpu().numpy()


def process_batch(dataloader):
    device = torch.device("cuda:0")
    model = torch.load(model_file, map_location=device)
    model.eval()
    model.share_memory()
    chunk = np.uint16(np.zeros(N, M))
    pool = Pool(8)
    pool.map(partial(model_process, chunk), dataloader)
    pool.close()
    pool.join()
    return