Multithread multistream

In a multi-threading multi-stream environment, only one device was used and the ‘setcurrentCUDAStream()’ function was used mapping streams.
After one thread set the current stream to stream 1, before queuing the function ‘tensor.sum()’ to stream 1, I expected that if another thread changed the current stream to stream 2, ‘tensor.sum()’ would execute in stream 2, but this did not happen as we observed through profiling. I repeated this process several times through for loop, but sum() was only run in Stream 1.

How is this possible?


The cuda streams are thread local.
So each thread can have a different current stream that will be used.

Thanks for replying.
One more question.
Then, What’s the difference between setCurrentCUDAStream and StreamGuard?
Under what circumstances do you use StreamGuard?

The guard is useful to make sure that when you get outside of the current scope, the previous stream is restored (even if an error is thrown in the current scope).
So you are sure that you don’t “polute” the current stream of your parent function.