When does cuda operator synchronize in pytorch

Hello, I am a user of pytorch (1.0.1). I would like to ask a question about the cuda operator. This question has troubled me for a long time and I have not found an answer.

Many cuda operators, such as “aten/src/ATen/native/cuda/Activation.cu: prelu_cuda”. In this function, “at::cuda ::getCurrentCUDAStream(curDevice)” is called, but “cudaStreamSynchronize(stream)” is not called, I know that cuda’s operators are asynchronous, but should reasonably call “cudaStreamSynchronize(stream)” in a suitable place, I only found “cudaStreamSynchronize(stream)” in “aten/src/ATen/native/cuda/Copy.cu” , But this happens during the process of copying cpu to cuda or cuda to cpu. If running a network, the CPU releases the memory allocated for cuda in advance, such as calling cudaFree, but cuda is not calculated at this time, the release of these cuda resources is triggered by the release of the temporary Tensor in the network (if I am wrong please correct me, thank you), this will cause serious problems, right? So I want to know how the pytorch guarantees that cuda is already calculated when the temporary Tensor is released, I think that “cudaStreamSynchronize” should be called once before the intermediate Tensor is released. I don’t know if you will reply to me, but I’m still looking forward to your help to answer this question that has troubled me for a long time.

Thank you very much!

That is correct and synchronizations should only be called, if necessary.

If I’m not mistaken, the synchronization happens here.
Are you seeing any issues with using CUDA or is this more of a general question?