Hi, when running certain operations, say a timer start, it is usually required to do a synch to assure correct behavior. However, I have seen several examples doing a synch after creating and transferring a tensor to a cuda device ( and nothing before ). Is that actually necessary, assuming non_blocking=False? Also isn’t a non-blocking transfer generally a synchronization point?
No, you won’t need a full device sync when transferring data. The non_blocking
argument allows an async transfer w.r.t. the host. To move the data async w.r.t. other CUDA ops you would need to use a custom CUDAStream
and synchronize it with the default stream before using the data, which is an advanced use case.
Thank you. Of course I was referring to HtoD transfers, not DtoD. Should have made that clearer. Just to reiterate the latter point: a ( blocking ) HtoD transfer will be a synchronization point ( for the respective stream, usually the default one ), or am I wrong?
My explanation also refers to HtoD copies, not DtoD, so I’m unsure what causes the misunderstanding.
Yes, if non_blocking=True
is used the data will be copied using the default stream asynchronously w.r.t. the host.
Ah, I see. I interpreted “async wrt other CUDA ops” as referring to a second device. So, nevermind. thank you.