Here is a related discussion FYR:
Here is the TLDR:
gpu_tensor.to(device="cpu", non_blocking=True)
will asynchronously copy a tensor from GPU to CPU pinned memory- As you point out,
pinned_cpu_tensor.copy_(gpu_tensor, non_blocking=True)
will asynchronously copy a tensor from GPU to an existing CPU pinned memory tensor - Once the asynchronous copy has been launched, you need to use
torch.cuda.Stream.synchronize()
ortorch.cuda.synchronize()
to ensure the copy has finished before using its content