How does torch.cuda.synchronize() behave?

ptrblck · March 22, 2022, 6:44am

Work of independent processes should be serialized (CUDA MPS might be the exception). Process A doesn’t know anything about process B, so a synchronize() (or cudaDeviceSynchronize) call would synchronize the work of the current process. However, if process B uses the GPU for a display output etc. you might see a latency increase depending when the context switch occurs.
It depends what you are trying to do in your Python code. E.g. if you are only scheduling work, no synchronizations are needed and won’t be added. On the other hand, if you are e.g. printing a value of a CUDATensor, then an implicit sync is added since the value has to be computed and pushed to the CPU first.