Could anyone explain what might cause swap to be used a lot when using pin_memory
, and what are some common ways to fix the problem? Is it due to trying to pin too much memory? Or some other cause? If you have multiple data workers, does each have it’s own memory pin (which might be part of the issue leading to pinning too much memory)?
Additionally, the documentation says
once you pin a tensor or storage, you can use asynchronous GPU copies. Just pass an additional
non_blocking=True
argument to acuda()
call. This can be used to overlap data transfers with computation.
If I am not using non_blocking CUDA calls after using pin_memory
does that, there is still an advantage to using pin_memory
, correct?