Are there any tools that can help to visualize and debug parallelism problems in my models?
For example:
I would love to see a chart showing when asynchronous kernels actually started and finished execution whether there are any data dependencies between them, whether kernel launch was delayed because of memory pressure, and so on.
I would like to make sure that my code is not accidentally causing CPU/GPU synchronization by converting result of GPU computation into a plain Python/NumPy type (or by performing any other operation that causes synchronization).
Thanks, I’ve exported chrome trace from profiler, but I’m not sure how to interpret it. What are those “cpu_to_cuda” event lines? What are “Outgoing flow” and “Preceding/Following events”, which appear in the lower panel when I click something?
Also, this does not seem to help with my second problem (unintentional synchronization).