Does not contain CUDA Unified Memory CPU page faults data

Andy_P · January 29, 2024, 8:10am

Dear community,
I replaced the CUDA memory allocation from cudaMalloc to cudaMallocManaged in c10/cuda/CUDACachingAllocator.cpp in the PyTorch open-source code, and successfully compiled it. It can be used as expected, and PyTorch version is v1.13.0 .

When training a GNN, I successfully oversubscribed the GPU memory.

When I use nsys to analyze a Python program,
nsys profile --stats=true --cuda-um-gpu-page-faults=true --cuda-um-cpu-page-faults=true --show-output=true python reddit.py
there are no occurrences of any page faults.

Analysis results are as follows:

...
[ 9/11] Executing 'um_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.
[10/11] Executing 'um_total_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.
[11/11] Executing 'um_cpu_page_faults_sum' stats report
SKIPPED: /mypath/report90.sqlite does not contain CUDA Unified Memory CPU page faults data.

Why is that?

Thanks.

hawkheimmer · April 18, 2024, 11:54am

How did you edit the CUDACachingAllocator.cpp? Did you just replace cudaMalloc() with cudaMallocManaged() or did you replace cudaMalloc_count{}, cudaMallocMaybeCapturing(), release_lock_on_cudamalloc() and cudaMallocAsync() as well?