Pin_memory limitation

B_S · June 12, 2025, 2:21pm

I can’t pin more than 2gb at ones, to pin more memory I need to break it to multiple chunks.
for example doing this will arise an error

import torch
buffer =  torch.empty(int(3.0*1024**3), dtype=torch.uint8, pin_memory=True)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

however pinning larger memory by breaking into multiple chunks will work fine

buffer_1 = torch.empty(int(2.0*1024**3), dtype=torch.uint8, pin_memory=True)
buffer_2 = torch.empty(int(2.0*1024**3), dtype=torch.uint8, pin_memory=True)

I don’t know why this is happening, the size of the memory > 1T and there is no limitation on the max locked memory. also it seems that it is occurring with others too, check this discussion
in deepspeed github repo.

more system info:
os: 24.04.1-Ubuntu
gpu : NVIDIA H100
python: 3.12
cuda: 12.8
pytorch: 2.7.0

ptrblck · June 12, 2025, 5:46pm

Not reproducible on my system, so I guess your system might disallow these large allocations:

>>> import torch
>>> buffer =  torch.empty(int(3.0*1024**3), dtype=torch.uint8, pin_memory=True)
>>> buffer.shape
torch.Size([3221225472])

python · June 12, 2025, 10:56pm

I’m seeing the same issue on my end: allocating a single pinned memory buffer larger than ~2GB throws a CUDA error: invalid argument, but multiple smaller chunks work fine. This points to a limitation on the size of individual contiguous pinned memory allocations, not the total pinned memory available. It may be due to driver or hardware-level constraints, especially with the H100 architecture. Since there are no locked memory limits set, the behavior likely stems from how the CUDA backend handles large pinned memory requests.

B_S · June 23, 2025, 11:10am

would be helpful if you know how to check for such constraints