Does DCG compute asynchronously?

qbx2 · July 15, 2017, 2:03pm

Hello, I profiled my code using cprofile, but it seemed weird. One operation (sum) took most of execution time. And when I add time.sleep(0.02) before the operation, the time taken by the operation was decreased. I suspect that is because the operation waits all pending computations to be done. Is it one of the DCG properties?

hughperkins · July 16, 2017, 2:43am

Am I right in thinking the sum was a full reduction? Its because it implicitly copies the reuslt hostside, causing a sync point.

By the way, you are right that gpu operations are async by defaut, in the absence of any kind of sync point, such as reading data to hostside.

qbx2 · July 16, 2017, 6:53am

Yes, sum is full reduction. Thanks for clear answer.

Then are cpu operations synchronous?

EDIT: Why does full reduction copy the result to the host?

albanD · July 16, 2017, 9:48am

Hi,

The CPU operations are synchronous. Only the GPU operations are asynchronous.

The full reduction returns a number, and to be able to return this number, it has to wait for the computation to be done.

hughperkins · July 16, 2017, 12:07pm

There are a few possible ‘why?’:

what is the technical underlying reason?
why is it like this?

The technical underlying reason is that anything that causes a ‘read’ of an actual concrete value from the gpu causes a sync point. Operations returning torch tensors dont necessarily force sync points. However reduce all, in its current implementation, returns a scalar float, rather than a tensor. This forces a sync point.

Why does reduce all return a scalar, rather than a tensor? I dont actually know but I guess some combinatin of:

maybe torch was written before gpus were widely available, and on the cpu, making reduce all return a float seems not unreasonable?
torch is written by conv net guys, and for conv nets, reduce all causing a sync point is almost un-noticeable, in practice

qbx2 · July 16, 2017, 3:28pm

That’s exactly what I wondered. I didn’t even know sum returns float.
Thanks!