Hello, I am trying to generate multiple Processes that each have their own cuda stream and are able to sync to a main process.
The goal is to have each stream receive camera images, shove it on the gpu, do some preprocessing and then provide the data to the main process for a machine learning application.
I found an example here that works with multiple streams and multiprocessing here:
The example just creates new streams for every process, due to the fact that spawn
start method reinitialize the Stream on each process.
What I need is shared stream that I can sync, so I know when I can acces the data after calculations are finished to avoid race conditions.
I tried to use Stream.cuda_stream
to pass the pointer to an ExternalStream
as seen in the example below but get the error:
RuntimeError: CUDA error: invalid resource handle
Here is my toy example:
import os
import time
import torch
from torch.multiprocessing import Process, set_start_method
os.environ['CUDA_LAUNCH_BLOCKING'] = "1"
set_start_method("spawn", True)
def process(stream_ptr, data):
stream = torch.cuda.ExternalStream(stream_ptr, "cuda:0")
with torch.cuda.stream(stream):
data[:] = data * 2
if __name__ == "__main__":
stream = torch.cuda.Stream(device='cuda:0')
with torch.cuda.stream(stream):
data = torch.ones((2, 2), device='cuda:0')
data.share_memory_()
p1 = Process(target=process, args=(stream.cuda_stream, data))
p1.start()
timeout_start = time.perf_counter()
while time.perf_counter() - timeout_start < 10:
print(data)
time.sleep(1)
p1.join()