Understanding asynchronous execution


(Konpat Ta Preechakul) #1

It is said in https://pytorch.org/docs/master/notes/cuda.html that GPUs operations are asynchronous. Operations are enqueued and executed in parallel. But, there is also a caveat that this process is under the hood. Users are expected to see it as synchronous.

If I understand correctly, if the user demands the result of an operation, it can’t be waited any longer, it then must perform the operation stripping away a chance for better optimization.

What operations then that would force the execution of such operations? What are some guidelines to take most out of this asynchronous execution thing?


(Roy Li) #2

Sorry, I’m a bit confused about what you’re asking. What do you mean by “user demands the result of an operation”?


(Konpat Ta Preechakul) #3

For example “print(tensor)”, if the user side demands this it must block whatever expressions that come after this expression, must it not?

Are there some other expressions of that kind?

I think the question I have is rather by what is a tensor “represented” in the Python perspective? Like if I create one tensor, I just get a placeholder rather than a real array of values. And whatever I do to that placeholder is just that I get another placeholder. All the operations are scheduled and optimized under the hood. Only if I demand the result of it to be represented in non Pytorch way, it blocks until the placeholder is resolved.


(Roy Li) #4

Operations that require a synchronize will block (see cudaStreamSynchronize and cudaDeviceSynchronize). In particular, device-to-host transfers require a synchronize, which is why print will block.

Tensors are backed by a python storage that holds a pointer to data that can be on GPU or CPU.


(Konpat Ta Preechakul) #5

That makes sense. Does it mean that appending a Python list with a tensor could be done promptly without the need for synchronization?


(Roy Li) #6

Yeah, that should be correct. (Unless there’s some device-to-host transfer going on there, but I don’t think that’s the case.)