How to maximize CPU <==> GPU memory transfer speeds?

@ptrblck

If I “strace -f” my GPU app I find not a single mlockxxx call nor a mmap with MAP_LOCKED call.

I presume there isn’t a major bug in pytorch where they are missing doing the memory pin such that crashes might occur under stress if the system starts swapping pages. I only know of the two ways mentioned to pin memory pages.