In my libtorch C++ project, two threads are used, and each thread has its own cuda stream. when the stream of thread2 tried to use the tensor output by the stream of thead1, in most cases, the tensor value was not expected in thread2. Any ideas? Thanks
This sounds as if you are missing synchronizations in your code.
Is synchronization necessary between cuda streams?
Yes, as explained in the docs:
Operations inside each stream are serialized in the order they are created, but operations from different streams can execute concurrently in any relative order, unless explicit synchronization functions (such as
wait_stream()) are used. For example, the following code is incorrect:
cuda = torch.device('cuda') s = torch.cuda.Stream() # Create a new stream. A = torch.empty((100, 100), device=cuda).normal_(0.0, 1.0) with torch.cuda.stream(s): # sum() may start execution before normal_() finishes! B = torch.sum(A)
When the “current stream” is the default stream, PyTorch automatically performs necessary synchronization when data is moved around, as explained above. However, when using non-default streams, it is the user’s responsibility to ensure proper synchronization.