A CUDA stream is a queue of GPU operations that are executed in a specific order. The order in which the tasks are added to this queue determines their order of execution.
You can use multiple streams to overlap operations as described also here.
Multiple threads will execute the parallel portion of your CUDA kernel and this blog post might be a good starter.