How to maximize CPU <==> GPU memory transfer speeds?

PyTorch uses cudaHostAlloc, but you can also allocate cudaHostRegister via torch.cuda.cudart() in case you want to manage the host memory with MAP_LOCKED manually.

1 Like