Parallel async inference of multiple models gives CUDA error: initialization error

zshn25 · March 15, 2023, 8:24am

I am trying to run inference of multiple models parallelly on CUDA. My use case is to run them on the same input image. I get the following error.

RuntimeError: CUDA error: initialization error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Even with CUDA_LAUNCH_BLOCKING=1 I get RuntimeError: CUDA error: initialization error

A MWE would be

import torch
from torch import nn
import torch.multiprocessing as mp
from torchvision import models


if __name__ == '__main__':

    num_processes = 2

    input_image_pytorch = torch.randn((1,3, 256, 256), requires_grad=False, device="cuda")

    processes = []   
    for rank in range(num_processes):
        model = models.resnet18().eval().to("cuda")
        p = mp.Process(target=model, args=(input_image_pytorch,))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()

zshn25 · March 15, 2023, 10:38am

Solved this by adding mp.set_start_method('spawn')