Hello, I am trying to generate multiple Processes that each have their own cuda stream and are able to sync to a main process.
The goal is to have each stream receive camera images, shove it on the gpu, do some preprocessing and then provide the data to the main process for a machine learning application.
I found an example here that works with multiple streams and multiprocessing here:
The example just creates new streams for every process, due to the fact that
spawn start method reinitialize the Stream on each process.
What I need is shared stream that I can sync, so I know when I can acces the data after calculations are finished to avoid race conditions.
I tried to use
Stream.cuda_stream to pass the pointer to an
ExternalStream as seen in the example below but get the error:
RuntimeError: CUDA error: invalid resource handle
Here is my toy example:
import os import time import torch from torch.multiprocessing import Process, set_start_method os.environ['CUDA_LAUNCH_BLOCKING'] = "1" set_start_method("spawn", True) def process(stream_ptr, data): stream = torch.cuda.ExternalStream(stream_ptr, "cuda:0") with torch.cuda.stream(stream): data[:] = data * 2 if __name__ == "__main__": stream = torch.cuda.Stream(device='cuda:0') with torch.cuda.stream(stream): data = torch.ones((2, 2), device='cuda:0') data.share_memory_() p1 = Process(target=process, args=(stream.cuda_stream, data)) p1.start() timeout_start = time.perf_counter() while time.perf_counter() - timeout_start < 10: print(data) time.sleep(1) p1.join()