Sync inside forward() in Module?

I found Synchronize() on the nvprof even though I did not sync after creating a non-default stream as non-blocking.
Does this happen inside the forward() function of the module?
If so, let me know exactly where it happens on the code.

It depends on your model implementations and the used operations.
To isolate it further you could use nvtx markers and check where the synchronization is happening (via torch.cuda.nvtx.range_push and torch.cuda.nvtx.range_pop).

I am using the model provided by Torchvision.
Is it possible only with NVTX?

I use nvprof and I could know whether the cudaDeviceSynchronize(), cudaStreamSynchronize() function was called, but it’s not known who is calling synchronize().