Sorry @ptrblck for not following up on this. It dropped down in my backlog for a while. I think I’ve finally figured out how to do GPU->CPU asynchronously. Calling tensor.to()
doesn’t seem to allow you to specify the output buffer or request output to be placed in pin-memory. So using tensor.to()
for GPU->CPU transfer is always synchronous. However I think I was able to get async transfer by creating a pin-memory buffer in cpu and using pinned_cpu_tensor.copy_(gpu_tensor, non_blocking=True)
. Can you confirm this is the correct way to achieve asynchronous GPU->CPU data transfer?
5 Likes