The relationship between CUDA stream and thread

Hello everyone, I am quite confused about CUDA stream and thread. Here is my understanding:

  • Thread: a processing thread, or can be understood as a kernel invocation, where I can launch many threads in parallel to perform the same task.
  • CUDA stream: from what I have read, it seems that CUDA stream is mainly used to optimize memcpyh2d → kernel call → memcpyd2h operations.

So, if I understand correctly, CUDA stream and thread are meant to work together, (the code below is just pseudo code) instead of:

def some_kernel(job):
   memcpyh2d(job)
   kernel(job)
   memcpyd2h(job)

we should use :

def some_kernel(job):
   pieces = partion_into_pieces(job)
   for idx, p in enumerate(pieces):
      memcpyh2d(p, stream=idx)
      kernel(p, stream=idx) 
      memcpyd2h(p, stream=idx)

Am I understand correctly?

A CUDA stream is a queue of GPU operations that are executed in a specific order. The order in which the tasks are added to this queue determines their order of execution.

You can use multiple streams to overlap operations as described also here.

Multiple threads will execute the parallel portion of your CUDA kernel and this blog post might be a good starter.

1 Like