I can’t pin more than 2gb at ones, to pin more memory I need to break it to multiple chunks.
for example doing this will arise an error
import torch
buffer = torch.empty(int(3.0*1024**3), dtype=torch.uint8, pin_memory=True)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
however pinning larger memory by breaking into multiple chunks will work fine
buffer_1 = torch.empty(int(2.0*1024**3), dtype=torch.uint8, pin_memory=True)
buffer_2 = torch.empty(int(2.0*1024**3), dtype=torch.uint8, pin_memory=True)
I don’t know why this is happening, the size of the memory > 1T and there is no limitation on the max locked memory. also it seems that it is occurring with others too, check this discussion
in deepspeed github repo.
more system info:
os: 24.04.1-Ubuntu
gpu : NVIDIA H100
python: 3.12
cuda: 12.8
pytorch: 2.7.0