Hello everyone, I am quite confused about CUDA stream and thread. Here is my understanding:
- Thread: a processing thread, or can be understood as a kernel invocation, where I can launch many threads in parallel to perform the same task.
- CUDA stream: from what I have read, it seems that CUDA stream is mainly used to optimize memcpyh2d → kernel call → memcpyd2h operations.
So, if I understand correctly, CUDA stream and thread are meant to work together, (the code below is just pseudo code) instead of:
def some_kernel(job):
memcpyh2d(job)
kernel(job)
memcpyd2h(job)
we should use :
def some_kernel(job):
pieces = partion_into_pieces(job)
for idx, p in enumerate(pieces):
memcpyh2d(p, stream=idx)
kernel(p, stream=idx)
memcpyd2h(p, stream=idx)
Am I understand correctly?